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Abstract 

This  paper  presents  numerical  results  for  a  large  and  varied  set  of  problems  using  software 
that  is  widely  available  and  has  undergone  extensive  testing.  The  algorithms  implemented  in 
this  software  include  Newton-based  linesearch  and  trust-region  methods  for  unconstrained  opti¬ 
mization,  as  well  as  Gauss-Newton,  Levenberg-Marquardt,  and  special  quasi-Newton  methods  for 
nonlinear  least  squares.  Rather  than  give  a  critical  assessment  of  the  software  itself,  our  original 
purpose  was  to  use  the  best  available  software  to  compare  the  underlying  algorithms,  to  identify 
classes  of  problems  for  each  method  on  which  the  performance  is  either  very  good  or  very  poor, 
and  to  provide  benchmarks  for  future  work  in  nonlinear  least  squares  and  unconstrained  opti¬ 
mization.  The  variability  in  the  results  made  it  impossible  to  meet  either  of  the  first  two  goals; 
however  the  results  are  significant  as  a  step  toward  explaining  why  these  aims  are  so  difficult  to 
accomplish. 
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1.  Introduction 


The  nonlinear  least-squares  problem  is  that  of  minimizing  a  sum  of  squares 

mi*  -  52  (•»■)’, 

»€•"  2  ^ 


•■1 


in  which  each  is  a  real-valued  function  having  continuous  second  partial  derivatives.  The 
problem  can  also  be  posed  as  a  minimization  of  the  norm  of  a  multivariate  function : 


where 


/  ^i(»)  \ 
^7(r)  1 


We  shall  refer  to  the  function  |  ||/(z)||3  as  the  nonlinear  leaat-aqnares  objective  function.  It  is 
assumed  that  ir  and  m  are  relatively  small,  so  that  algorithms  are  not  formulated  wn'th  any  special 
considerations  for  limited  storage.  A  common  instance  is  the  choice  of  parameters  P  within  a 
nonlinear  model  ip : 


where  d,-  are  observations  at  prescribed  values  r,-. 


Many  specialized  algorithms  have  been  developed  to  take  advantage  of  the  structure  of  the 
nonlinear  least-squares  objective.  This  paper  presents  numerical  results  for  a  large  and  varied 
set  of  problems  using  software  that  is  widely  available  and  has  undergone  extensive  testing. 
The  algorithms  implemented  in  this  software  include  Newton-based  linesearch  and  trust-region 
methods  for  unconstrained  optimization,  as  well  as  Gauss-Newton,  Levenberg-Marquardt,  and 
special  quasi-Newton  methods  for  nonlinear  least  squares.  Rather  than  give  a  critical  assessment 
of  the  software  itself,  our  original  purpose  was  to  use  the  best  available  software  to  compare  the 
underlying  algorithms,  to  identify  classes  of  problems  on  which  the  performance  of  each  method 
is  either  very  good  or  very  poor,  and  to  provide  benchmarks  for  future  work  in  nonlinear  least 
squares  and  unconstrained  optimization.  The  variability  in  the  results  made  it  impossible  to  meet 
either  of  the  first  two  goals.  However,  the  results  are  significant  in  that  they  reveal  a  great  deal 
about  the  reasons  these  aims  why  aims  are  so  difficult  to  accomplish.  For  treatment  of  issues  and 
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methodology  for  software  performance  evaluation  see.  e.  g..  Mor4,  Garbow,  and  Hiilstrom  [1978; 
1981],  Hiebert  [1979;  1981],  and  Hanson  and  Ktogh  [1987].  Hiebert  [1979;  1981]  conducts  an 
extensive  evaluation  of  twelve  programs  for  nonlinear  least  squares,  in  which  she  includes  software 
that  uses  first  derivatives  as  well  as  some  that  does  not.  In  the  present  study,  ail  of  the  software 
requires  first  if  not  second  derivatives  of  the  problem  functions. 

This  paper  is  organized  as  follows.  Section  2  reviews  computational  techniques  for  the  uncon¬ 
strained  optimization  problem.  These  methods  are  of  interest  because  the  nonlinear  least  squares 
problem  is  a  particular  instance  of  unconstrained  optimization,  so  that  special-purpose  algorithms 
for  sums  of  squares  should  compare  favorably  in  performance  with  those  developed  for  the  rrtore 
general  case.  Moreover,  much  of  the  motivation  for  unconstrained  optimization  methods  is  also 
relevant  to  algorithm  development  for  nonlinear  least  squares.  Our  emphasis  is  on  computational 
issues  related  to  the  methods  included  in  this  study.  For  mere  extensive  treatment  of  uncon¬ 
strained  optimization  algorithms,  see  Fletcher  [1980],  Gill,  Murray,  and  Wright  [1981],  Dennis 
and  Schnabel  [1983],  and  Mord  and  Sorensen  [1984].  Section  3  reviews  the  principal  approaches 
that  are  used  in  software  for  nonlinear  least-squares  problems.  These  are  Gauss-Newton  methods; 
Levenberg-Marquardt  methods,  one  of  which  is  implemented  in  the  software  package  MINPACK 
[Mord  (1978),  Mord,  Garbow,  and  Hiilstrom  (1980)];  corrected  Gauss-Newton  methods  [Gill  and 
Murray  (1978)],  which  are  the  basis  for  the  NAO  Library  nonlinear  least-squares  software;  and 
methods  that  form  quasi-Newton  approximations  to  the  term  B  *=  nonlinear 

least-squares  Hessian,  a  strategy  that  is  adaptivdy  combined  with  a  Gauss-Newton  method  and 
a  Levenberg-Marquardt  method  in  the  computer  algorithm  ■L2S0L  [Dennis,  Gay,  and  Welsch 
(1981a,  1981b)l.  We  assume  a  knowledge  of  nurrterical  techniques  for  linear  least-squares  (e.  g., 
Lawson  and  Hanson  [1974],  and  Golub  and  Van  Loan  [1983]).  For  more  information  on  algo¬ 
rithms  specific  to  nonlinear  least-squares  problems,  see  Fraley  [1988]  and  the  references  cited  in 
that  paper.  Section  4  is  a  summary  and  discussion  of  the  numerical  results.  Section  5  contains 
tables  of  all  of  the  results,  as  well  as  information  about  the  software  and  test  problems  used  in 
obtaining  them. 


1.1  Definitions  and  Notation 


We  shall  use  the  following  definitions  and  notational  conventions  : 

•  Generally  subscripts  on  a  function  mean  that  the  function  is  evaluated  at  the  corresponding 
subscripted  variable  (for  example,  /s  =  /(ri,))-  An  exception  is  made  for  the  residual 
functions  where  the  subscript  is  the  component  index  for  the  vector  /. 


•  f  -  The  vector  of  nonlinear  functions  whose  norm  is  to  be  minimized. 

The  nonlinear  ieast>squares  problem  is 

where  the  factor  j  is  introduced  in  order  to  avoid  a  factor  of  two  in  the  derivatives. 


•  6i  '  The  tth  Ttvida»l  function,  also  the  >th  component  of  the  vector  /, 

/  djfr) 

/(i')s  : 

An  alternative  formulation  of  the  nonlinear  least-squares  problem  is 

1  " 
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where  each  ^i(x)  is  a  smooth  function  mapping  97”  to  97, 


•  J  -  The  Jacobian  matrix  of  /. 


7r7 


J(T)  =  V/(T)  = 


9*m 

TfrT 
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•  g  -  The  gradient  of  the  nonlinear  least-squares  objective. 

g(r)  =  V  =  J(T)'^/(T) 


•  B  -  The  part  of  the  Hessian  matrix  of  the  nonlinear  least-squares  objective  that  involves 
second  derivatives  of  the  residual  functions. 

.T 


where 


•  “RiA)  -  The  T»nge  of  A. 

K  A  is  an  m  x  n  matrix,  then 

=  {b  \  At  =  h  for  some  x  6  5?" } 

is  a  subspace  of  Sf*" . 

•  AfiA)  -  The  nn/l  space  of  A. 

If  is  an  in  X  t»  matrix,  then 

Ar(A)={x\Az  =  0} 

is  a  subspace  of  If”.  Af(A)  is  the  orthogonal  complement  of  RiA^)  in  9f*. 


2.  Methods  for  Unconstrained  Optimization 

2.1  Local  Quadratic  Approximation 

Software  for  the  unconstrained  optimization  problem 

min  /■(/). 

is  usually  based  on  successive  minimization  of  a  quadratic  approximation 

for  T(7h  +  p)  —  /‘(r*),  the  change  in  at  The  matrix  is  positive  definite,  so  that  the 
model  Qkip)  has  a  well-defined  minimum 


that  can  be  computed  efficiently.  Positive  definiteness  of  Hk  also  means  that  pa  is  a  descent 
direction  for  T  at  ra.  which  is  essential  for  linesearch  methods  (see  Section  2.2.1).  For  methods 
based  on  (2. hi),  the  condition 


lim 

|p-*no 


l|(//*-'r2:r(,*))pa| 

IIpi^II 
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(2.1.2) 


is  equivalent  to  local  superlinear  convergence  of  the  sequence  {ra  +  Pa}  fo  •n  isolated  local 
minimum  j*  of  T  (see  Dennis  and  Mori  [1974;  1977]).  The  relationship  (2.1.2)  implies  that  the 
step  Pa  approaches  the  Newton  step  in  both  magnitude  and  direction,  although  the  sequence  of 
matrices  {Hk}  need  not  converge  to  V*.?^(x*). 


Section  2.2  is  concerned  with  modifications  that  are  used  to  enforce  convergence  from  an 
arbitrary  starting  point.  These  modifications  fall  into  two  categories:  linesearch  methods  and 
Inisl-region  methods.  Section  2.3  deals  with  the  choice  of  Ifk  in  (2.1 .1 ),  so  that  condition  (2.1 .2) 
for  superlinear  convergence  is  satisfied.  We  discuss  methods  that  use  exact  second  derivatives  as 
well  as  quasi-Newton  approximations. 


2.2  Globalization  Strategies 


2.2.1  Linesenrch  Approach 


Linctearch  methods  obtain  a  new  iterate  in  two  essentially  separate  phases.  First,  a  drsernt 
direction  pk  is  found  for  T',  that  is,  a  vector  pi,  is  computed  for  which 

r/7p*<0.  (2.2.1) 

Condition  (2.3.1)  is  equivalent  to  saying  that  T  initially  decreases  along  the  direction  pi,  from 
Xk.  Various  ways  of  defining  pi  are  discussed  in  Section  2.3.  This  section  is  concerned  with  the 
second  phase  of  a  linesearch  method,  that  of  finding  a  steplength  ni,  satisfying 

+«»?»)< /■(xa),  (2.2.2) 

once  a  descent  direction  is  obtained. 

Because  of  (2.2.1),  condition  (2.2.2)  can  be  satisfied  by  choosing  a  sufficiently  small  value 
of  Ok,  but  the  result  may  not  be  an  appreciable  reduction  in  In  fact,  {ra  +  f*kPk)  'n*y 
converge  to  a  point  that  is  not  a  stationary  point  unless  conditions  stronger  than  (2.2.2)  are 
imposed  on  rra  (see,  e.  g.,  Dennis  and  Schnabel  [1983],  Chapter  6).  On  the  other  hand,  finding  a 
minimum  of  along  pa  *»  sn  iterative  process  which  could  require  many  function  and  derivative 
evaluations.  Steplength  algorithnrw  instead  compute  na  that  satisfies  conditions  sufficient  to 
ensure  convergence  to  a  statiorrary  poirrt  of  T  whenever  the  sequence  [pa)  is  bounded  away 
from  orthogonality  to  the  gradient. 

The  work  of  Goldstwn  [1965;  1967],  Armijo  [1966],  Goldstein  and  Price  [1967],  and  Wolfe 
[1969;  1971],  (see  also  Ortega  and  Rheinboldt  [1970])  established  the  fundamental  principles 
underlying  nrost  steplength  algorithms.  A  simple  strategy  for  sufficient  decrease  is  based  on  the 
condition 

^{xk  +  napa)  -  .^(xa)  <  prraVJ’i^pa,  (2.2. .1) 

for  /I  €  [0, 1).  An  initial  value  (usually  <»>  =  0  "•»  tried  first,  and  then  a  backtracking  strategy  is 
used  to  reduce  it  until  an  admissible  step  is  found.  The  steplength  strategy  of  Gill  et  al.  [1979], 
combines  (2.2.2)  with  the  condition 

\V^{xk  +<»aPa)^Pa|  <  -'l^^kPk,  (2.2.4) 

for  q  e  [0, 1 ),  which  keeps  the  steplength  bounded  away  from  zero  by  forcing  it  to  approximate  a 
local  minimum  of  T  along  pa.  A  procedure  for  one- dimension  a  I  minimization  is  truncated,  using 


6 


(2.2.4)  a<  the  criterion  for  termination.  This  is  accomplished  by  polynomial  interpolation  to  the 
fo"  :lion 


♦(rt)  =  ^{Tk  +  npk). 


(2.2..S) 


together  with  sonte  safeguards  to  prevent  iterates  from  being  either  too  close  together  or  too  far 
apart.  An  exact  minimization  would  be  carried  out  for  i;  =  0  in  (2.2.1),  while  larger  values  of  ri 
increesingty  relax  this  requirement.  When  /i  <  i;,  an  interval  of  steplengths  satisfies  both  (2.2..‘l) 
and  (2.2.1);  if  /r  it  chosen  sufficiently  small,  then  (2.2.2)  almost  always  holds  when  (2.2.4)  does. 
When  /r  >  If,  a  backtracking  strategy  may  be  used  if  (2.2.3)  fails  to  hold  for  the  steplength 
computed  in  the  one-dimensional  minimization.  If  <  0  and  na  satisfies  (2.2.3)  and 

(2.2.1),  then 


*-'«>  llwllj 

which  implies  convergence  to  a  stationary  point  of  provided  (pa }  remains  uniformly  bounded 
away  from  orthogonality  to  {T'/a}-  If  /r  <  D.S,  both  conditions  (2.2.3)  and  (2.2.4)  are  auto¬ 
matically  satisfied  by  superlinearly  or  quadratically  convergent  algorithms  with  na  =  I  when  za 
is  sufficiently  close  to  a  local  minimum. 

Although  the  theory  allows  considerable  flexibility  in  choosing  the  interpolant  to  ^{n)  and 
other  parameters  in  the  univariate  minimization,  as  well  as  in  the  choice  of  /t  and  in  (2.2.3) 
and  (2.2.1),  in  practice  performance  on  difficult  problems  may  be  very  sensitive  to  these  factors. 
Moreover,  safeguarding  in  univariate  minimization  requires  specification  of  a  finite  interval  of 
uncertainty  in  which  the  minimum  is  presumed  to  lie.  If  pk  ia  very  large,  it  could  happen  that 
no  satisfactory  approximation  to  a  minimum  along  that  direction  can  be  found,  resulting  in  an 
excessively  small  step. 


2.2.2  IVtist'Region  Approach 

Trust-region  methods  were  first  developed  for  nonlinear  least  squares  [Levenberg  (1944): 
Morrison  (1960);  Marquardt  (1963)]  (see  Section  3.2),  and  later  independently  for  general  un¬ 
constrained  minimization  [Goldfeld,  Quandt,  and  Trotter  (1966)].  Motivation  for  trust-region 
methods  corrtes  from  the  following  observation:  if  the  step  to  the  unconstrained  minimum  of  the 
current  local  model  for  f(T  +  p)  -  is  relatively  large,  then  it  probably  falls  outside  the 
region  in  which  the  model  is  applicable.  The  basic  idea  is  to  define  a  neighborhood  of  the  current 


point  over  which  an  approximate  minimization  of  a  local  model  of  the  change  in  T  will  yield  a 
suitable  step  to  the  next  iterate. 

The  local  model  and  constraints  defining  the  neighborhood  are  chosen  so  that  the  subproblem 
has  a  well-defined  minimum,  and  so  that  fast  local  convergence  is  possible  with  the  unconstrained 
model.  Typically,  the  model  at  7a  is  a  quadratic  function  j  Hip,  and  an  upper  bound 

is  imposed  on  a  scaled  I2  norm  of  p.  giving  the  subproblem 

inin  -I- ^  (2.2.6) 

subject  to(|Z?apI(5  - 

For  practical  reasons,  the  scaling  matrices  Da  are  usually  diagonal  (with  positive  diagonal  entries). 
Solving  (2.2.6)  is  equivalent  to  minimizing  the  quadratic  function 

+  5  /  ifik  +  Dj Da )  p  (2.2.7) 

for  some  Aa  >  0,  where  the  matrix  Hk  -f  AaDjDa  is  at  least  positive  semi-definite. 

In  practice,  it  has  been  found  to  be  more  satisfactory  to  control  the  value  of  f>k  directly 
rather  than  Aa  (see  Mori  [1983]).  Increases  and  decreases  in  Aa  are  usually  based  on  comparing 
the  actual  reduction 

+  Pk)-  ^(fk) 

to  the  reduction  predicted  by  the  model, 

V:F7pa  +  lp7ffaPa. 

The  updating  procedure  for  Aa  can  be  as  simple  as  multiplying  the  current  value  by  some  pre¬ 
scribed  factor,  without  compromising  global  convergence  properties  (see  below).  The  preferred 
strategy  for  decreasing  Aa  is  more  complicated.  An  approximate  minimum  it,  of  ^{rk  +  rpa) 
is  computed  by  safeguarded  polynomial  interpolation  (as  in  linesearch  methods  —  see  Section 
2.2.1),  and  rallDapall]  i*  taken  to  be  the  new  value  of  Aa.  It  may  be  necessary  to  decrease  Aa  a 
number  of  times  before  a  suitable  reduction  in  is  achieved  and  the  step  to  a  new  iterate  can 
be  taken. 

Once  Aa  is  assigned  a  value,  it  remains  to  find  pa  when  the  solution  to  (2.2.6)  is  not  an 
unconstrained  minimum.  MorAand  Sorensen  [1983]  obtain  Aa  in  (2.2.7)  by  truncating  a  numerical 
procedure  for  computing  a  zero  of  the  function 

<P(A)  =  ||p*(A)||3  -  Aa  =  ||(/fa  ♦-  XPj  Ih)~'  VJ-a||^  -  Aa, 


(2.2.8) 


based  on  the  work  of  Hebden  [1973]  (see  also  Reinsch  [1971]  and  Gay  [1981])  The  algorithm 
of  Gay  [1983],  implemented  in  the  PORT  Library  [1984],  approximates  pt(  A)  by  a  linear  com¬ 
bination  of  the  (scaled)  steepest  descent  direction  and  the  Newton  direction.  This  technique 
was  devised  by  Powell  [1970]  (see  also  Dennis  and  Mei  [1979]),  and  is  used  because  it  achieves 
comparable  performance  to  methods  that  attempt  to  approximate  ^(A)  closely,  with  considerably 
less  computational  effort. 

Somewhat  stronger  convergence  results  have  been  proven  for  trust-region  methods  than  are 
known  for  linesearch  methods  (see  Section  2.2.1).  Trust-region  methods  converge  to  an  isolated 
local  minimum  under  fairly  mild  conditions  when  exact  second  derivatives  are  used,  and  otherwise 
to  a  stationary  point.  Although  global  convergence  properties  are  not  affected,  in  practice  the 
choice  of  fiQ  and  the  updating  strategy  for  f>i,  are  important.  As  (>%,  and  hence  the  norm  of  p,  is 
made  to  approach  zero,  the  minimizer  of  the  quadratic  becomes  parallel  to  the  steepest  descent 
direction,  Small  values  of  fif,  are  accordingly  safe,  in  the  sense  that  they  guarantee  a 

decrease,  but  progress  may  be  unacceptably  slow  if  no  provision  is  made  for  taking  larger  steps 
where  possible. 

For  more  detail  and  general  discussion  of  trust-region  methods,  see  Mori  [1983],  Shultz, 
Schnabel,  and  Byrd  [198S],  and  Bulteau  and  Vial  [1987].  Related  variants  are  described  in 
Bulteau  and  Vial  [198S]  and  Byrd,  Schnabel,  and  Shultz  [1988]. 

2.2.3  Stationary  Points  and  Directions  of  Negative  Curvature 

It  is  possible  to  decrease  .f  at  a  stationary  point  x“  if  the  Hessian  matrix  has  one  or  more 
negative  eigenvalues.  The  decrease  is  obtained  by  moving  along  a  direction  of  negative  curvature; 
in  other  words,  a  direction  p  for  which  p^3r^/'(7*)p  <  0.  Trust-region  methods  that  use 
the  quadratic  model  with  exact  Hessian  information  (see  Section  2.3.1)  will  yield  directions 
of  negative  curvature  at  stationary  points  when  V^.F’fr*)  is  indefinite,  whereas  the  linesearch 
methods  discussed  above  terminate  when  the  gradient  vanishes. 

A  fundamental  problem  is  that  of  deciding  the  length  of  the  step  to  be  taken  along  a  direction 
of  negative  curvature.  This  problem  is  very  much  related  to  the  problem  of  setting  a  maximum 
step  length  in  order  to  safeguard  a  linesearch  method,  or  that  of  determining  the  step  bound  in  a 
trust-region  method.  First-  and  second-order  information  about  the  function  at  r*  indicates  that 
an  infinite  step  can  be  taken,  since  the  quadratic  part  of  the  Taylor  series  at  z*  Is  unbounded 


bdow  when  it  indefinite.  Clearly  an  infinite  step  is  not  possible  if  T  has  a  finite 

minimum. 


Neither  the  question  of  choosing  a  direction  of  negative  curvature,  nor  the  problem  of  choos¬ 
ing  a  steplength  along  such  directions  has  been  adequately  resolved,  and  thus  in  most  methods 
directions  of  negative  curvature  are  not  explicitly  sought  at  arbitrary  points.  For  research  on 
generating  directions  of  negative  curvature,  and  on  their  use  in  unconstrained  optimization  algo¬ 
rithms,  see  Gill  and  Murray  [1974a],  Fletcher  and  Freeman  [1977],  McCormick  [1977],  Mor4  and 
Sorensen  [1979],  Goldfarb  [1980],  and  Shultz,  Schnabel,  and  Byrd  [1985]. 

2.3  Defining  the  Quadratic  Model 

2.S.1  Secoiid-Derivative  Methods 

There  are  two  basic  frameworks  for  defining  Jfk  in  the  quadratic  model  (2.1.1)  when  second 
derivative  information  is  available:  direct  modification  of  the  Hessian,  and  trust-region  methods. 
Both  can  be  viewed  as  procedures  for  producing  a  positive-definite  quadratic  model  by  modifying 
the  exact  Hessian  A  method  that  combines  the  two  approaches  is  given  in  Chapter  5 

(Section  S)  of  Dennis  and  Schnabel  [1983]. 

The  modified  Newton  method  of  Gill  and  Murray  [1974a]  is  a  linesearch  method  in  which 
the  definition  of  the  search  direction  is  based  on  the  fact  that  if  Jit,  is  positive  definite,  it  can  be 
characterized  by  its  Cholesky  fnclorizntion 

fft  =  RTnt,  (2.S.1) 

where  Rh  is  upper-triangular  and  nonsingular  (see,  e.  g.,  Stewart  [1973],  Chapter  3).  Gill  and 
Murray  alter  the  Cholesky  factorization  procedure  so  that  it  can  be  continued  in  the  event  of 
indefiniteness  or  near-singularity.  The  modified  factorization  is  applied  to  the  Hessian  matrix 
resulting  in  the  Cholesky  factorization  of  a  matrix  Hi,  with  a  prescribed  upper  bound  on 
its  condition  number.  The  matrix  //»  may  differ  from  only  in  fhe  diagonal  elements.  Local 
convergence  properties  of  of  Newton's  method  are  preserved,  because  Hi,  =  whenever 

is  sufficiently  positive  definite.  An  implementation  is  available  in  the  NAG  Library  [1984] 
(subroutine  E04LBF).  For  information  on  other  direct  modification  methods,  see  Gill,  Murray, 
and  Wright  [1981],  Chapter  4,  and  Higham  [1986]. 
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In  trust-region  methods  with  exact  Hessian  information,  a  subproblem  of  the  form 


(2..V2) 

subject  to||D*Hl2  ^ 

is  solved  for  the  step  pi,  to  the  next  iterate.  We  recall  from  the  overview  of  trust- regions  in 
Section  2.2.2  that  solving  (2.3.2)  is  equivalent  to  solving 

•  m\nVJ^^p+^p^{V^^k  +  >^kOjDi)p  (2.3.3) 

for  some  non-negative  value  of  A(,  with  \kDjDf,  positive  semidefinite.  In  particular, 

Aa  will  be  positive  whenever  is  indefinite,  and  also  when  is  positive-definite  if  6^ 

happens  to  be  smaller  than  the  norm  of  the  scaled  unconstrained  minimum  of  the  quadratic 
objective,  in  contrast  to  the  modified  Newton  method  described  above,  all  of  the  eigenvalues  of 
changed  when  Xi,  >  0  in  (2.3.3).  As  long  as  the  constraint  in  (2.3.2)  is  inactive  near 
a  local  minimum,  the  local  convergence  behavior  of  Newton's  method  is  preserved.  A  recent 
implementation  of  a  trust- region  method  that  uses  second  derivatives  is  available  in  the  PORT 
Library  [1984]  (subroutine  OHIH:  see  also  Gay  [1983]).  For  further  information  on  trust-regions 
with  exact  Hessian  information,  see  Fletcher  [1980],  Chapter  S,  Gay  [1981],  Sorensen  [1982], 
Mer4  [1983],  Mord  and  Sorensen  [1983],  and  Shultx,  Schnabel,  and  Byrd  [1985]. 

2.3.2  Qaaisi-Newton  Methods 

In  qnnn-ffewtoM  methods  (also  called  r-arrable  metric  or  meant  methods),  a  sequence  of 

approximations  Ho,H\ . to  the  Hessian  matrix  of  is  generated,  with  depending  on 

Hh  as  well  as  on  gradient  information  at  the  current  iterate.  The  approximate  Hessian  is  required 
to  satisfy  the  qnani-iVeirfon  condition 


Rs+i*»=jrt.  (2.3.4) 

'»  =  »»+i  -  tk  =  -^^k- 

The  quantity  yjxa  approximates  the  curvature,  fjT^fk*k.  of  f  along  Xj^.  Equation  (2.3.4)  does 
not  uniquely  define  //»4i;  papers  that  discuss  completion  of  the  specification  include  Dennis  and 
Mor4  [1977],  Nazareth  [1984],  Todd  [1984],  and  Flachs  [1986].  Conditions  imposed  on  the 
approximate  Hessian  almost  always  include  symmetry  and  positive  definiteness. 


II 


It  is  generally  agreed  that  the  best  overall  performance  is  achieved  by  the  BFGS  update 


//*+.  =  »k  - 


.  VkvJ 

<•*  ifk*k 


although  precise  reasons  for  its  superiority  are  still  not  known  (see,  e.  g.,  Brodlie  [1977]).  Like 
most  proposed  updates,  the  BFGS  update  is  a  rank-two  modification  of  the  current  approximate 
Hessian.  The  BFGS  update  preserves  positive  definiteness  whenever  pj «*  >  0,  a  condition  that 
holds  automatically  in  a  linesearch  method  satisfying  (2.2.4). 

Originally,  quasi-Newton  updates  were  formulated  in  terms  of  rather  than  //«,  so  that 
minimizing  the  quadratic  (2.1.1)  at  each  stage  in  a  linesearch  algorithm  involved  only  a  matrix 
multiplication  (O(n^)  arithmetic  operations)  rather  than  solution  of  a  linear  system  (C7(n’) 
arithmetic  operations).  Gill  and  Murray  [1972]  showed  that  rank-two  quasi-Newton  methods 
could  be  implemented  in  O(n^)  operations  per  iteration  by  applying  an  update  to  a  Cholesky 
factor  (see  Section  2.3.1)  of  Hk-  This  has  the  additional  advantage  that  it  allows  the  numerical 
positive  definiteness  of  Hk  to  be  monitored  from  iteration  to  iteration.  For  more  information  on 
computational  aspects  of  the  update,  see  Dennis  and  Schnabel  [1983],  Chapter  9,  and  Gill  et  al. 
[1985]. 

The  BFGS  method  belongs  to  a  class  of  quasi-Newton  methods  that  can  be  derived  by 
minimizing  the  difference  (ISfa+i  -  Hk)  oi  {H^^  -  777*)-  various  wdghted  norms,  subject 
to  (2. .3. 4)  [Dennis  and  Schnabel  (1979)].  Other  classes  of  methods  attempt  to  minimize  the 
condition  number  of  Hk  by  selecting  parameters  in  a  class  of  updates  at  each  step  [Shanno 
and  Kettler  (1970);  Oren  (1973,  1982);  Davidon  (1975);  Oren  and  Spedicato  (1976);  Spedicato 
(1976);  Schnabel  (1978)].  Al-Baali  and  Fletcher  [1985]  spply  a  scaling  factor  before  updating 
that  minimizes  an  approximate  measure  of  the  error  in  the  inverse  Hessian  matrix.  Performance 
tests  indicate  that  these  modified  methods  are  not  as  successful  as  the  BFGS  method  for  general 
problems  [Brodlie  (1977);  Shanno  and  Phua  (1978b);  Al-Baali  and  Fletcher  (1985)]. 

Under  the  same  assumptions  as  required  for  local  quadratic  convergence  of  Newton's  method, 
quasi-Newton  methods  are  locally  superlinearly  convergent,  provided  Hq  is  sufficiently  close  to 
V^/‘(j’o)  [Broyden,  Dennis,  and  Mor6  (1973)].  Selection  of  the  initial  Hessian  approximation 
Hq  can  be  critical  in  the  performance  of  a  quasi-Newton  method.  Often  the  identity  is  chosen 
because  it  gives  the  steepest-descent  direction  on  the  first  iteration,  and  it  is  positive  definite. 
Computational  tests  have  shown  that  improved  performance  can  sometimes  be  achieved  by  scaling 
Hq  before  performing  the  first  update  [Shanno  and  Phua  (1978a);  Dennis  and  Schnabel  (1983), 
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Chapter  9],  Another  poesibility  is  to  use  a  finite-dilTerence  approximation  to  for  Hq. 

modified  if  necessary  to  ensure  positive  definiteness.  Although  the  choice  of  Hq  can  have  a 
significant  effect  on  performance,  the  question  of  how  best  to  choose  Ho  i*  still  open.  It  is 
generally  agreed  that  exact  or  approximate  curvature  information  should  be  used  to  start  the 
algorithm  if  it  is  available  at  a  reasonable  cost.  For  a  nonlinear  least-squares  problem,  JqJq  on 
be  used  as  the  initial  estimate,  provided  the  columns  of  Jo  linearly  independent. 


0 


% 


n 


3.  Methods  for  Nonlinear  Least  Squares 


3.1  Gauss-Newton  Methods 

The  Gauss-Newton  tnethod  is  a  linesearch  method  in  which  the  search  direction  at  the  current 
iterate  minimizes  the  quadratic  function 

(3  1. 1) 

As  a  model  for  the  change  in  the  least-squares  objective,  (3.1.1 )  has  the  advantage  that  it  involves 
only  first  derivatives  of  the  residuals,  and  that  is  always  at  least  positive  semi-definite.  If 


=  -s,  (3.1.2) 

so  that  fP"  is  a  direction  of  descent  for  whenever  p  0,  as  required  in  a  linesearch  method. 
To  guarantee  convergence,  the  sequence  of  search  directions  must  also  be  bounded  away  from 
orthogonality  to  the  gradient,  a  condition  that  may  not  be  met  by  successive  Gauss-Newton 
directions  unless  the  eigenvalues  of  7^7  are  bounded  away  from  zero.  Powell  [1970a]  gives  an 
example  of  convergence  of  a  Gauss-Newton  method  with  exact  linesearch  to  a  non-stationary 
point. 

The  Gauss-Newton  method  can  be  viewed  as  a  modification  of  Newton's  method  in  which 
7*^7  is  used  to  approximate  the  Hessian  matrix 

m 

7‘^7  -I-  =  7'^7  -I- 

t«i 

of  the  nonlinear  least-squares  objective  function.  The  assumption  is  that  the  matrix  7*^7  should 
be  a  good  approximation  to  the  full  Hessian  when  the  residuals  are  small.  In  fact,  if  J[t’)  =  0 
and  J(r’)  is  positive  definite,  then  the  sequertce  {r*  -I-  pf'*}  is  locally  quadratically 

convergent  to  r‘,  because  J{ri,)  is  an  C’()i>’»  -  z’*||)  approximation  to  the  Hessian  of 

the  nonlinear  least-squares  objective  at  z*. 


Wh«n  J^J  is  singular,  or,  equivalently,  when  ./  has  linearly  dependent  columns,  (•‘t.l  .1 )  does 
not  have  a  unique  minimizer.  The  set  of  vectors  that  minimize  (3.M)  is  the  same  as  the  set  of 
solutions  to  the  linear  least-squares  problem 

mill  II  Jp -I- /llj  .  (3.1.3) 

One  (theoretically)  well-defined  alternative  that  is  often  approximated  computationally  is  to  re¬ 
quire  the  unique  solution  of  minimum  /}  norm: 

niinllpll,.  (3.1.1) 

* 

where  S  is  the  set  of  solutions  to  (3.1.3).  while  another  is  to  replace  J  in  (3.1.3)  by  a  maximal 
linearly  independent  subset  of  its  columns.  In  finite-precision  arithmetic,  there  is  often  some 
ambiguity  about  how  to  formulate  and  solve  an  alternative  to  (3.1.3)  when  the  columns  of  J  are 
"nearly”  linearly  dependent,  which  is  significant  because  the  numerical  solution  of  these  problems 
is  dependent  on  the  criteria  used  to  estimate  the  rank  of  J.  Fraley  [1987b]  gives  some  detailed 
examples  that  illustrate  some  of  the  difficulties  that  arise  in  implementation. 

3.2  Levenberg'Marqnardt  Methodf 

Levcnberg-Marquardt  methods  alter  the  Gauss-Newton  search  direction  in  the  range  of  J, 
by  replacing  J'^J  in  the  quadratic  model  fuiKtion  with  J'^J  +  XD'^D,  where  A  >  0  and  D  is 
a  diagonal  scaling  matrix  with  positive  diagonal  entries.  The  step  p  between  successive  iterates 
minimizes  the  quadratic  model 

+  (3.2.1) 

for  some  A  >  0.  Since  the  matrix  J'^J  +  X D  is  positive  semidefinite,  minimizers  p\  of  ( 3 .2 . 1 ) 
satisfy  the  equations 

(J^J  +  XD'^D)p^-g^  -rj, 

which  are  the  normal  equations  for  the  linear  least-squares  problem 

min  9^p  +  X  p^J^-fp 
r€*"  2 

1.'. 


(3.2.2) 


(3.2.3) 


Equivalently,  p  solves 


(3.2.1) 


subject  to  ilDHI;  ^ 

for  some  A  >  0;  that  is.  the  Gauss-Newton  quadratic  model  is  minimized  subject  to  a  trust-region 
constraint. 

Considerable  research  effort  has  been  directed  toward  improvements  in  this  class  of  methods 
since  their  introduction  by  Levenberg  [1944],  Morrison  [I960],  and  Marquardt  [1963]  for  nonlinear 
least-squares  problems,  and  independently  by  Goldfeld,  Quandt,  and  Trotter  [1966]  for  general 
unconstrained  optimization.  Mor6  [1978]  gives  an  implementation  in  which  he  adjusts  the  step 
bound  fi  in  (.3.2.1)  rather  than  A,  a  strategy  used  in  trust-region  methods  for  unconstrained 
optimization  (see  Mor6  [1983]  for  a  survey).  Changes  in  6  depend  on  agreement  between  the 
actual  reduction  in  the  sum  of  squares 

\  (|I/(^  +  pa)II^-||/(t)||’). 

with  the  reduction 

q'^Vx  +  \pxJ^JPx 

predicted  by  the  model  J^J  +  D,  which  is  the  optimum  value  of  the  objective  in  (S.2.4). 
Increases  are  accomplished  by  taking  Si,^i  s  2|| 77a Fails*  '*l**'l<  ^  '*  decreased  by  multiplying  by 
the  factor  7  <  1.  In  order  to  obtain  A  when  the  bound  in  (.3.2.4)  is  active,  the  nonlinear  equation 

=  ll^^Fills  -  «  =  I  +  XD’^Df'  =  0  (»-2-5) 

is  approximately  solved  by  truncating  a  safeguarded  Newton  method  based  on  the  work  of  Hebden 
[1973]  (see  also  Reinsch  [1971]).  Mor6  reports  that,  on  the  average,  (3.2..5)  is  solved  fewer 
than  two  times  per  iteration.  He  also  proves  global  convergence  to  a  stationary  point  of  /'^/, 
without  assuming  boundedness  for  {Aa}.  Many  computational  details  are  given,  including  an 
efficient  method  for  calculating  the  derivative  of  '9(A)  in  (.3.2..'>)  that  uses  the  QR  factorization 
of  J.  Equation  (3.2.2)  is  solved  by  a  modification  of  the  two-stage  factorization  described  by 
Osborne  [1972]  that  allows  column  pivoting.  Subroutine  LHDER  in  MINPAC'K  [Mor6,  Garbow, 
and  Hillstrom  (1980)]  is  an  implementation  of  the  method.  Variables  are  scaled  internally  in 
LKDER  according  to  the  following  scheme;  the  initial  scaling  matrix  Dq  is  the  square  root  of  the 
diagonal  of  .7^.7  evaluated  at  /o.  and  the  rth  diagonal  element  of  Dk  is  taken  to  be  the  maximum 
of  the  rth  diagonal  element  of  77a_|  and  the  square  root  of  the  rth  diagonal  element  of  J^J. 
Numerical  results  are  presented  indicating  that  this  scaling  compares  favorably  with  those  used 
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in  earlier  research.  The  user  alto  has  the  option  of  providing  an  initial  diagonal  scaling  matrix 
that  is  retained  throughout  the  computation. 

3.3  Corrected  Gauss- Newton  Methods 

Gill  and  Murray  [1976]  propose  a  linesearch  algorithm  that  divides  into  complementary 
subspaces  ft  and  Af,  where  ft  C  and  it  nearly  orthogonal  to  77(7^).  The  search 

direction  is  the  sum  of  a  Gauss-Newton  direction  in  ft,  and  a  projected  Newton  direction  in 
Xt.  This  strategy  avoids  a  shortcoming  of  Gauss-Newton  methods  —  that  components  of  the 
search  direction  that  are  nearly  orthogonal  to  may  not  be  wall  determined  when  J  is 

ill-conditioned  —  because  each  component  is  computed  from  a  reasonably  well-conditioned  sub¬ 
problem.  The  vector  t  -  t‘  may  become  almost  entirely  in  in  a  Gauss-Newton  method, 

\ 

yet  the  algorithm  computes  a  search  direction  that  is  virtually  orthogonal  to  77^7'*')  due  to  ill  con¬ 
ditioning  in  the  Jacobian  (see  Fraley  [1987b]).  Gill  and  Murray  show  that  both  Gauss-Newton 
algorithms  defined  by  (.^.1.4)  and  Levenberg-Marquardt  algorithms  generate  search  directions 
that  lie  in  77(7'^),  while  the  Newton  search  direction  generally  will  have  a  component  in  V(7), 
the  orthogonal  complement  of  whenever  7  has  linearly  dependent  columns.  For  prob¬ 

lems  with  small  residuals,  they  point  out  that  7'*' 7  is  a  reasonable  approximation  to  the  full 
Hessian  in  ^(7'^),  but  not  in  .^(7).  Thus,  in  situations  where  x  —  r*  is  orthogonal  to  7l(J^), 
and  7  is  well-conditioned  but  has  litrearly  dependent  columns  (for  example,  when  m  <  tr),  the 
Gauss-Newton  and  Levenberg-Marquardt  directions  have  no  component  in  the  direction  of  x-t‘, 
while  Newton's  method  and  also  the  method  of  Gill  and  Murray  would  have  components  in  both 
^(7^)  and  A'’(7). 

A  version  of  this  algorithm  called  the  rorrrcled  Gontw-Newton  method  [Gill  and  Murray 
(1978)]  forms  the  basis  for  the  nonlinear  least-squares  software  in  the  NAG  Library  [1984],  Rules 
based  on  the  relative  size  of  the  singular  values  of  7  are  given  for  choosing  an  integer  gradc{J)  to 
approximate  rnnk{J),  and  an  attempt  is  made  to  group  together  singular  values  that  are  similar 
in  magnitude.  The  method  is  not  as  sensitive  to  grotlr{J)  as  Gauss-Newton  is  to  rank  estimation, 
both  because  of  the  division  of  the  computation  of  the  search  direction  into  separate  components 
in  V  and  .V,  and  because  gradt{J)  is  varied  adaptively  based  on  a  measure  of  the  progress  of 
the  minimization.  Moreover,  the  rate  of  convergence  is  potentially  faster  than  Gauss-Newton 
or  Levenberg-Marquardt  methods  on  problems  with  nonzero  residuals.  The  quantity  gradr(J) 
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is  reduced  when  the  sum  of  squares  is  not  adequately  decreasing,  so  that  there  is  the  potential 
of  having  .V  =  If"  (with  exact  second  derivatives,  this  implies  taking  full  Newton  steps)  in  the 
vicinity  of  a  solution. 

When  ravk{J)  =  gr«dr{J)  =  n,  the  search  direction  p  is  a  full-rank  Gauss-Newton  direc¬ 
tion.  Otherwise  the  vector  p  is  computed  as  the  sum  of  two  mutually  orthogonal  components;  a 
C*uss-Newton  direction,  and  a  projected  Newton  direction.  The  projected  Hessian  is  replaced  by 
a  modified  Cholesky  factorization  (see  Section  2.3.1)  if  it  is  computationally  singular  or  indefinite. 
A  modified  Newton  search  direction  (corresponding  to  the  case  gradr{J)  =  0)  is  used  when¬ 
ever  if  p)\  is  smaller  than  some  prescribed  value,  or  if  g'^^p  is  positive.  A  quasi-Newton 

approximation  to  B  (see  the  discussion  in  Section  3.4)  and  a  finite-difference  approximation  to 
the  projected  matrix  Z^BZ  along  the  columns  of  Z,  where  7  is  an  orthogonal  basis  for  are 
^ven  as  alternatives  to  handle  cases  in  which  second  derivatives  of  the  residual  functions  are  not 
available  or  are  difficult  to  compute.  Gill  and  Murray  test  their  method  on  a  set  of  twenty-three 
problems,  and  find  that  the  version  that  uses  quasi-Newton  approximations  to  B  does  not  per¬ 
form  as  well  as  those  that  use  exact  second  derivatives  or  finite-difference  approximations  to  a 
projection  of  B.  They  observe  only  linear  convergence  for  the  quasi-Newton  version  on  problems 
with  large  residuals.  The  algorithms  are  implemented  in  the  NAT!  Library  [1984];  subroutine 
E04HEF  uses  exact  second  derivatives,  while  subroutine  E04GBF  is  the  quasi-Newton  version. 

3.4  Special  Quasi-Newton  Methods 

Special  quasi-Newton  methods  for  nonlinear  least  squares  use  a  Hessian  of  the  form  J'^J  +  B 
in  the  quadratic  model,  so  that  the  search  direction  differs  from  the  Gauss-Newton  direction 
in  and  also  has  a  component  in  A^(J)  when  J  is  rank-deficient.  The  matrix  ^  is  a 

quasi-Newton  approximation  to  the  term  B  —  in  the  Hessian  of  the  nonlinear 

least-squares  objective.  Brown  and  Dennis  [1971]  first  proposed  a  method  in  which  the  Hessian 
matrix  of  each  of  the  residuals  was  updated  separately.  This  approach  is  impractical  because  it 
entails  the  storage  of  in  symmetric  matrices  of  order  rr,  and  more  recent  research  has  aimed  to 
approximate  B  *%  t  sum. 

Gill  and  Murray  [1978]  discuss  a  linesearch  method  in  which  they  use  the  augmented  Gauss- 
Newton  quadratic  model  only  to  compute  a  component  of  the  search  direction  in  a  subspace 
that  approximates  the  null  space  of  the  Jacobian  (see  Section  3.3).  They  apply  the  BFCiS 
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formula  for  unconstrained  optimization  (see,  e.  g.,  Dennis  and  Mori  [1977])  to  the  matrix  //»  = 
A+i  +  Bk  with  the  quasi-Newton  condition 

=  Pk 


where 

»nd  F»  =  jf*+i-Js. 

and  then  form  Bk+i  =  Hk+)  -  Jj+i J*4i-  They  point  out  that,  if  J^+iA+i  +  '*  positive 

definite,  and  >  0,  then  +  Bk^\  i*  also  positive  definite  with  this  scheme.  In 

order  to  safeguard  the  method,  the  projected  approximate  Hessian  is  replaced  by  a  modified 
Cholesky  factorization  when  it  is  singular  or  indefinite.  In  addition,  if  the  cosine  of  the  angle 
between  the  search  direction  and  the  gradient  of  the  nonlinear  least-squares  objective  exceeds  a 
fixed  threshold  value,  a  modified  Newton  step  with  the  full  augmented  approximate  Hessian  is 
taken.  See  Section  3.3  for  a  summary  of  th«r  observations  on  the  performance  of  the  methods. 

Dennis,  Gay,  and  Welsch  [1981a]  apply  a  scaled  DFF  update  to  Bk  at  each  step.  The  new 
approximation  Bk+i  solves 


subject  to 


where 


mn\\B-^fHnBk-B)ir^f^\\F 

H»k  —  fai  B  positive  definite 
B«k  =  Jk+i/k+i  -  B  symmetric, 


T>  s  mindfjas/ai^^sasl,]}. 


The  Kale  factor  7%  is  based  on  the  obMrvation  that  the  quasi-Newton  approximation  to  B  is 
often  too  large  svith  the  unKaled  update,  on  account  of  the  contribution  of  the  residuals.  The 
term  ^  '*  derived  from  the  Mif-Ksling  principles  for  quasi-Newton  methods 

of  Orcn  [1973],  and  attempts  to  shift  the  eigenvalues  of  the  approximation  Bk  to  overlap  with 
those  of  Bk,  using  curvature  infMmation  at  rs.  The  algorithm  forms  the  basis  for  the  ACM 
computer  program  IL3S0L  [Dennis,  Gay.  and  WeiKh  (1981b)],  which  is  distributed  by  the  PORT 
Library  [1984]  as  subroutines  130  and  DI30.  It  is  implerrrented  as  an  adaptive  method,  in  that 
Gauss-Newton  steps  are  taken  if  the  Gauss-Newton  quadratic  nrodel  predicts  the  reduction  in 
the  function  better  than  the  quadratic  model  that  includes  the  term  involving  B.  A  trust-region 


in 


str»tegy  is  used  to  enforce  global  convergence.  Numerical  results  are  given  in  Dennis,  Gay,  and 
Welsch  {1981a]  for  a  set  of  twenty-four  test  problems,  many  with  two  or  three  different  starting 
values. 


» 
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4.  Discussion  and  Summary  of  Numerical  Results 

In  this  section,  we  summarize  numerical  results  obtained  for  the  unconstrained  optimization 
and  nonlinear  least-squares  methods;  more  detailed  results  are  tabulated  in  the  Appendix.  The 
tests  were  performed  using  the  following  software: 

•  DNIG/SUNSOL  -  Trust-region  method  for  unconstrained  optimization  that  uses  a  quasi- 
Newton  approximation  to  the  Hessian  matrix.  From  the  PORT  Library  [1984], 

•  IPSOL  •  Linesearch  method  for  unconstrained  optimization  that  uses  a  quasi-Newton  ap¬ 
proximation  to  the  Hessian  matrix.  From  the  Systems  Optimization  Laboratory,  Stanford 
University  (see  Gill  et  ai.  [1986b:  1987]).  Also  available  in  the  NAG  Library. 

•  OMIH/HDNSOL  •  Trust-region  method  for  unconstrained  optimization  that  uses  analytic  second 
derivatives.  From  the  PORT  Library  [1984]. 

•  nk '  Linesearch  method  for  unconstrained  optimization  that  uses  analytic  second  derivatives. 
This  implementation,  which  is  available  at  Stanford  Linear  Accelerator  Center,  is  from  the 
National  Physical  Laboratory,  England.  It  is  essentially  the  same  as  subroutine  E04UP  from 
the  NAG  Library  [1984]. 

•  G-N  •  Gauss-Newton  algorithm  for  nonlinear  least  squares  that  uses  LSSOL  (Gill  et  al. 
[1986a])  to  solve  the  linear  least  squares  subproblems,  and  the  linesearch  from  IPSOL  (Gill 
et  ai.  [1986b]).  Both  LSSOL  and  IPSOL  are  also  available  in  the  NAG  Library. 

•  LMOER  •  Levenberg-Marquardt  method  for  nonlinear  least  squares.  From  MTNPACK  (Mor4, 
Garbow,  and  Hillstrom  [1980]). 

a  DI3G/IL2S0L  •  Adaptive  method  for  nonlinear  least  squares  (combines  Gauss-Newton, 
Levenberg-Marquardt,  and  special  quasi-Newton  methods).  From  the  PORT  Library  [1984]. 

•  LSQPDQ  -  Corrected  Gauss-Newton  method  that  uses  a  quasi-Newton  approximation  to  the 
Hessian  matrix.  This  implementation,  which  is  available  at  Stanford  Linear  Accelerator 
Center,  is  from  the  National  Physical  Laboratory,  England.  It  is  essentially  the  tame  at 
subroutine  E046BF  from  the  NAG  Library  [1984]. 

•  LSQSDff  -  Corrected  Gauss-Newton  method  that  uses  analytic  second  derivatives.  This  im¬ 
plementation,  which  is  available  at  Stanford  Linear  Accelerator  Center,  is  from  the  National 
Physical  Laboratory,  England.  It  is  essentially  the  same  as  subroutine  E04HEF  from  the  NAG 
Library  [1984]. 
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Information  about  the  individual  test  problems  is  given  in  the  Appendix.  The  number  of 
function  evaluations  required  by  each  subroutine  is  listed  in  the  tables  at  the  end  of  this  section. 

In  addition,  the  following  symbols  are  used; 

^  zero-residual  problem 

'  linear  least-squares  problem 

-  failure  to  achieve  an  approximate  solution 

^  appears  to  be  unable  to  terminate  at  an  approximate  solution 

*  local  minimum  ^ 

'  termination  criteria  satisfied  at  a  point  away  from  a  local  minimum 

*  failed  with  default  step  length  or  trust-region  size 

Two  columns  of  figures  corresponding  to  two  different  values  of  a  single  parameter  are  given 
for  each  subroutine.  For  the  Gauss-Newton  methods,  the  parameter  affects  rank  estimation;  for 
all  of  the  other  methods,  the  parameter  affects  termination  criteria.  See  the  tables  of  numerical 
results  given  in  the  Appendix  for  information  about  the  precise  choices  that  were  made.  The 
wide  variability  in  the  numerical  results  makes  it  difficult  to  draw  definitive  conclusions  about 
the  relative  performance  of  the  software,  because  observations  of  small  samples  could  result  in 
misleading  generalizations.  The  sources  of  this  variability  are  discussed  below. 

First,  the  number  of  function  evaluations  may  n^t  be  an  adequate  basis  for  comparison. 

The  routines  vary  in  the  number  of  gradient  evaluations  performed  per  function  evaluation,  and 
second-derivative  methods  require  evaluation  of  the  Hessian  matrix.  Moreover,  when  function 
evaluations  are  relatively  inexpensive,  costs  could  be  dominated  by  other  portions  of  the  compu¬ 
tation.  Another  difficulty  in  making  comparisons  is  that  the  definition  of  an  acceptable  minimum 
varies  from  routine  to  routine.  For  example,  the  norm  of  the  gradient  of  the  nonlinear  least- 
squares  objective,  ||.(;||,  at  an  alleged  solution  r'  may  differ  considerably  for  different  software, 
although  .9(7*)  =  0  is  a  necessary  condition  for  a  minimum.  (On  problem  10.,  LMDER  terminates 
at  a  point  for  which  ||;||  is  of  order  lO'^,  while  DW26  terminates  at  a  point  for  which  ||;||  is  of  m 

order  ]0~^.)  Most  algorithms  do  not  attempt  to  reduce  ||.9||  directly,  but  convergence  criteria 
may  include  a  threshold  on  ||p(z*)||.  Depending  on  how  this  threshold  is  used  In  relation  to  other  ^ 

criteria,  some  routines  may  spend  more  function  evaluations  in  anticipation  of  a  reduction  in  ||^|| 
than  others.  A  small  value  of  ||.9||  means  greater  certainty  that  a  minimum  has  actually  been 
obtained,  but  may  be  unreasonably  expensive  to  achieve  In  practice. 
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Second,  aside  fiom  design  choices  that  define  a  particular  implementation  of  an  algorithm, 
the  user  is  permitted  to  specify  certain  parameters  that  may  affect  performance.  Fraley  [1987b] 
gives  examples  that  illustrate  the  sensitivity  of  Gauss-Newton  methods  to  rank  estimation  criteria 
(see,  e.  g.,  problems  35b.,  36«.,  and  20d.).  For  problems  on  which  an  algorithm  is  linearly 
convergent,  small  changes  in  tolerances  that  are  used  to  define  convergence  criteria  can  mean 
substantial  differences  in  the  amount  of  computation  required  in  order  to  obtain  a  point  satisfying 
conditions  for  convergence  (see,  e.  g.,  DNMG  on  24b..  and  LMDER  on  40.).  Selection  of  a  maximum 
steplength  or  an  initial  trust-region  radius  can  also  be  critical  factor  in  the  performance  of  a 
method.  In  these  tests,  the  default  values  for  these  parameters  were  altered  only  in  cases  where 
a  method  was  initially  observed  to  fail  by  attempting  to  evaluate  problem  functions  outside  the 
region  in  which  they  are  numerically  defined  (see,  e.  g.,  the  results  for  the  DeVilliers  and  Glasser 
test  problems  42.  and  43.).  Failures  of  this  sort  may  be  caused  either  by  poor  scaling  among 
the  variables  in  the  problem,  or  by  ill-conditioning  within  subproblems.  Hiebert  [1981]  is  of  the 
opinion  that  failure  due  to  ill-conditioning  can  be  avoided  in  software,  but  that  it  is  not  always 
possible  to  anticipate  abnormal  terminations  that  are  caused  by  bad  scaling. 

In  IPSOL,  one  can  specify  bounds  on  the  variables,  a  well  as  adjust  the  maximum  step 
length,  in  order  to  deal  with  this  type  of  difficulty.  Subroutine  E04LBF,  the  version  of  MIA  that 
is  available  in  the  NAG  Library  [1984],  also  provides  for  bounds  on  variables,  and  there  are 
alternative  versions  of  all  of  the  PORT  software  used  in  these  tests  that  allow  bounds  to  be 
specified.  Unfortunately,  when  bounds  on  the  variables  are  included  in  the  problem  formulation, 
local  minima  at  which  the  bounds  are  active  may  be  found  rather  than  local  minima  for  the 
nonlinear  least-squares  problem.  See  Hanson  and  Krogh  [1987]  for  numerical  results  in  which 
simple  bounds  are  included  for  some  problems.  Holt  and  Fletcher  [1979]  give  an  algorithm 
designed  for  nonlinear  least-squares  problems  with  explicit  bounds  on  the  variables. 

Third,  the  performance  of  any  given  method  over  the  set  of  test  problems  is  by  no  means 
uniform,  and  it  is  not  easy  to  separate  the  problems  into  classes  for  which  the  behavior  of  an 
algorithm  can  be  categorized.  One  reason  for  this  is  that  many  of  the  test  problems  recur  in 
the  literature  precisely  because  they  have  certain  distinguishing  properties.  Powell’s  singular 
function  and  variants  (13.  and  22.)  are  zero-residual  problems  in  which  the  Jacobian  becomes 
singular  at  the  solution.  The  McKeown  test  problems  (39.,  40.,  and  41.)  are  chosen  so  that 
the  Jacobian  is  well-conditioned  everywhere,  and  the  rate  of  convergence  for  the  Gauss-Newton 
method  with  unit  steplength  can  be  controlled  by  varying  a  single  parameter  (the  parameter  can 
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I 
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•iso  b«  chosen  so  that  the  unit-step  Gauss-Newton  method  diverges).  Both  Powell's  singular 
function  and  McKeown's  test  problems  are  constructed  analytically  rather  than  derived  from 
data-fitting  applications.  The  matrix  square  root  problems  (36.)  are  examples  of  small,  dense, 
nonlinear  systems  of  equations  requiring  a  very  accurate  solution.  Watson's  function  (20.)  comes 
from  polynomial  interpolation,  and  has  multiple  local  minima  with  small,  but  nonzero,  residuals. 
It  also  has  the  feature  that  the  Jacobian  becomes  increasingly  ill-conditioned  as  the  problem 
size  is  increased  (see  Fraley  [1987b]).  The  Gulf  Research  and  Development  function  (11.)  has 
discontinuities  in  the  derivative  of  each  residual  on  a  one-dimensional  subspace  and  hence  violates 
the  assumption  (made  in  developing  all  of  the  algorithms  we  have  discussed)  that  the  sum  of 
squares  has  continuous  second  derivatives.  The  results  for  the  DeVilliers  and  Glasser  test  problems 
(42.  and  43.)  illustrate  variability  in  performance  due  to  the  use  of  different  starting  values. 
More  generally,  the  behavior  of  a  given  method  for  a  certain  type  of  residual  function  may  not 
be  uniform  over  several  sets  of  defining  data  of  similar  magnitude,  as  shown  by  the  results  for 
the  Dennis,  Gay,  and  Vu  test  problems  (44.  and  45.). 

Finally,  there  is  considerable  diversity  in  performance  among  the  routines  tested,  and  few 
generalizations  are  possible.  Our  data  generally  supports  the  use  of  nonlinear  least-squares  soft¬ 
ware  over  that  designed  for  general  unconstrained  minimization,  but  there  are  some  exceptions 
(see,  e.  g.,  the  McKeown  test  problems  39.  •  41.).  Of  the  nonlinear  least-squares  routines,  0126 
(1IL2S0L)  is  often  the  best  (the  Dennis,  Gay,  and  Vu  test  problems  44.  and  45.  are  examples  of 
exceptions).  When  second  derivatives  are  relatively  cheap  to  obtain,  the  use  of  an  unconstrained 
optimization  method  that  uses  exact  second  derivatives  may  be  a  reasonable  alternative  to  a 
nonlinear  least-squares  method  (see,  e.  g.,  the  results  for  the  penalty  function  23b.).  Our  tests 
do  not  indicate  overall  superiority  of  any  particular  method  over  the  others;  in  situations  in  which 
a  variety  of  problems  are  being  solved,  we  conclude  that  it  is  desirable  to  have  the  flexibilty  to 
choose  from  among  several  methods. 
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Summary  of  Results  :  Unconstriuned  Optimisation  Methods 
(number  of  fnnction  e\'slns(ions) 


Mor^i  Garbow,  and  Hillstrom  Test  Problems 


n  m  DHIG  IPSOL  DNIR  MIA 


1." 

2 

2 

40 

42 

27 

29 

32 

32 

14 

14 

2." 

2 

2 

12* 

12* 

9 

10 

10' 

10' 

8 

8 

3." 

2 

2 

217 

220 

897 

897 

1-30 

132 

175 

175 

4.® 

2 

3 

66 

67 

20 

20 

22 

23 

1 

1 

5.® 

2 

3 

16 

17 

20 

22 

11 

12 

35* 

35* 

6. 

2 

10 

33 

34 

14* 

15* 

11 

11 

12 

12 

7.® 

3 

3 

28 

30 

37 

38 

16 

17 

14 

14 

8. 

3 

15 

19 

22 

22 

23 

9 

10 

11 

11 

9. 

3 

15 

8 

12 

8 

9 

4 

4 

3 

3 

10. 

3 

16 

465 

467 

450 

450 

382 

.388 

249 

249 

11.® 

3 

4' 

327 

mm 

2' 

290 

292 

538 

538 

12.® 

3 

10 

43 

45 

34 

35 

24 

24 

43 

43 

13.® 

4 

4 

62* 

89 

66 

71 

27* 

38 

38 

23*  23* 

14.® 

4 

6 

100 

102 

Kl 

51 

42 

49 

54 

54 

15. 

4 

11 

35 

36 

33 

35 

11 

12 

20 

20 

16. 

4 

20 

46 

47 

24 

25 

11 

13 

9 

10 

17. 

5 

33 

69 

72 

32*' 

56* 

46* 

47* 

43 

43 

18.® 

6 

13 

45 

47 

45' 

45' 

- 

- 

44 

44 

19. 

11 

65 

69 

72 

88 

90 

23 

24 

7' 

8* 

20s. 

6 

31 

35 

37 

43 

46 

15 

15 

13 

13 

20b. 

9 

31 

76 

79 

83 

85 

20 

22 

14 

14 

20c. 

12 

31 

89* 

148* 

55' 

151 

24 

24 

14 

14 

20d. 

20 

31 

no* 

134' 

msm 

114' 

50* 

1295' 

1295' 

21a.® 

10 

10 

120 

125 

101 

104 

25 

26 

14 

14 

21b.® 

20 

20 

189 

193 

252 

265 

27 

27 

14 

14 

22a.® 

12 

12 

143* 

235 

83* 

165* 

28* 

40 

2.3* 

23* 

22b.® 

20 

20 

187* 

344 

103* 

196* 

29* 

40 

24* 

24* 

23a. 

4 

5 

77 

78 

198 

198 

42 

4.3 

43 

43 

23b. 

10 

11 

80 

81 

117 

124 

44 

45 

44 

44 

24a. 

4 

8 

364 

472 

23' 

462 

126 

128 

158 

158 

24b. 

10 

20 

475 

632 

368 

419 

1.58 

162 

133 

133 

26 


Summary  of  Reaults  :  Nonlinear  Least-Squares  Methods 


(number  of  fanction  evalnalions) 


Mor4,  Garbow,  and  Hiilstrom  Test  Problems 


G-N  LKDER  DI26  LSQFDQ  LSQSOI 


1® 

31 

31 

22 

22 

14 

14 

31 

31 

31 

31 

2.® 

180* 

235* 

14* 

21* 

10* 

12* 

36' 

36* 

18* 

18* 

3.® 

31* 

42 

19 

19 

64 

65 

47 

47 

47 

47 

4.® 

54 

54 

40* 

54 

40* 

53 

64 

64 

53 

53 

5.® 

8 

8 

9 

10 

9 

10 

14 

14 

10 

10 

6. 

21 

28 

14 

16 

54 

54 

36* 

36* 

7.® 

13 

13 

11 

12 

13 

14 

20 

20 

14 

14 

8. 

m 

1 

7 

6 

7 

7 

8 

13 

13 

6 

6 

9. 

3 

3 

4 

5 

3 

5 

3 

3 

3 

3 

10. 

126 

126 

132 

133 

18 

18 

17 

17 

11.® 

- 

- 

- 

- 

- 

- 

69* 

69* 

30** 

12.® 

7 

7 

7 

8 

8 

9 

12 

12 

8 

12 

13.® 

16* 

16* 

65 

65 

19* 

25 

18* 

18* 

18* 

18* 

14.® 

96 

96 

70 

70 

52 

53 

81 

81 

87 

99 

15. 

43 

43 

18 

28 

11 

12 

30 

16 

16 

16. 

3651 

3651 

264 

356 

21 

22 

62 

62 

45 

45 

17. 

13 

13 

18 

19 

26 

27 

19 

19 

14 

18 

18.® 

- 

46 

46 

45 

46 

243* 

247* 

19. 

34 

34 

17 

19 

10131 

22 

33 

33 

19 

19 

20s. 

13 

12 

8 

10 

13 

13 

32 

32 

9 

9 

30b. 

6 

6 

9 

10 

12 

15 

6 

6 

6 

6 

30c. 

6 

6 

10 

13 

14 

14 

6 

6 

6 

6 

20d. 

6* 

6 

18 

23 

7* 

18 

18 

10 

12 

21s.® 

31 

31 

22 

22 

27 

la 

31 

31 

31 

31 

21b.® 

31 

31 

23 

23 

16 

16 

31 

31 

31 

31 

22s.® 

16* 

16* 

Kl 

rai 

20* 

26 

18* 

18* 

18* 

18* 

22b.® 

16* 

16* 

69 

69 

19* 

26 

18* 

18* 

18* 

18* 

33s. 

90 

10311 

34 

44 

36 

37 

Bl 

58 

58 

23b. 

■SI 

274 

84 

104 

61 

68 

143 

143 

124 

124 

24s. 

1043 

1043 

151 

156 

139 

142 

411 

411 

228 

228 

2ib. 

80 

88 

129 

138 

566* 

566* 

150 

150 

27 


Summitry  of  Results  :  Unconstrained  Optimisation  Methods  . 
(nambet  of  fanction  et-slnaiions) 


Mori,  Garbow,  and  Hillstrom  Test  Problems  (con tinned) 
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m 

DI1I6 

■PSOL 

DHiR 
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25a.‘* 

10 

12 

20 

21 

19 

20 

15 

15 

14 

14 

25b.‘' 

20 

22 

25 

26 

24 

24 

18 

19 

17 

18 

26a.® 

10 

10 

34' 

37* 

33* 

35' 

ll' 

12' 

21 

21 

26b.® 

20 

20 

62' 

65' 

IS 

78' 

20' 

20' 

30 

30 

27a." 

10 

10 

13 

16 

18 

19 

9 

10 

22 

22 

27b.® 

20 

20 

15 

18 

32 

11 

12 

31 

31 

28a.® 

10 

10 

31 

34 

33 

36 

4 

4 

4 

4 

28b.® 

20 

20 

60 

64 

54 

56 

4 

4 

4 

4 

29a.® 

10 

10 

8 

10 

7 

8 

4 

5 

4 

4 

29b.® 

20 

20 

8 

10 

7 

8 

4 

5 

4 

4 

30a.® 

10 

10 

51 

57 

37 

40 

6 

7 

7 

7 

30b.® 

20 

20 

65 

88 

65* 

67* 

6 

7 

7 

7 

31a.® 

10 

10 

46 

60 

67* 

70* 

9 

9 

9 

31b.® 

20 

20 

47 

63 

141 

144 

9 

9 

9 

9 

32.‘ 

10 

20 

6 

6 

2 

2 

6 

6 

4 

4 

33.‘ 

10 

20 

4 

4 

4 

4 

5 

5 

27 

27 

34.* 

5 

5 

4 

4 

6 

6 

20 

20 

35a. 

8 

8 

34 

38 

31 

33 

14 

14 

41 

41 

35b.® 

9 

9 

44 

46 

29 

32 

17 

18 

66 

66 

35c. 

10 

10 

41' 

45' 

37* 

42' 

19 

20 

86 

86 

Matrix  Square  Root  Test  Problems 

n 

m 

DRIQ 

■PSOL 

DHIH 

miA 

36a.® 

4 

4 

- 

- 

- 

- 

- 

- 

36b." 

9 

9 

- 

- 

_# 

- 

- 

- 

- 

36c." 

9 

9 

69 

101 

3 

3 

31 

35 

3188  3188 

36d.® 

9 

9 

- 

- 

_< 

- 

- 

- 

- 

28 


Summary  of  Results  t  Nonlinear  Least-Squares  Methods 


(number  of  function  evninstions) 


Mord,  Garbow,  and  Hillstrom  Test  Problems  (continued) 


G-N  LKDES  DI2Q  LSQPDQ  LSQSN 


25a.® 

11 

11 

11 

12 

15 

16 

16 

16 

12 

17 

25b.® 

13 

13 

13 

14 

19 

19 

18 

18 

14 

19 

26a.® 

16 

16 

28* 

37* 

11 

12 

22 

22 

18 

22 

26b.® 

20 

20 

57* 

71* 

39 

42 

18 

24 

18 

26 

27a.® 

21 

21 

15 

15 

8 

9 

26* 

26* 

22* 

28* 

2rb.® 

31* 

31* 

5' 

18 

11 

12 

31* 

31* 

21* 

27* 

28a.® 

4 

4 

5 

5 

4 

4 

4 

4 

4 

4 

28b.® 

4 

4 

5 

5 

3 

4 

4 

4 

4 

4 

29a.® 

4 

4 

5 

5 

4 

4 

10 

10 

6 

6 

29b.® 

4 

4 

5 

5 

4 

4 

10 

10 

6 

6 

30a.® 

6 

6 

6 

7 

8 

9 

11 

11 

7 

11 

O 

1 

6 

6 

6 

7 

8 

9 

11 

11 

7 

11 

31a.® 

7 

7 

7 

8 

10 

11 

12 

12 

IBI 

13 

31b.® 

7 

7 

7 

8 

10 

11 

12 

12 

8 

12 

32.‘ 

2 

2 

3 

3 

5 

5 

2 

2 

2 

3 

33.‘ 

3 

- 

3 

8 

18 

18 

2 

2 

2 

2 

34.*^ 

3 

3 

3 

7 

13 

13 

2 

2 

2 

2 

35a. 

- 

40 

53 

23 

24 

87* 

87* 

74 

74 

35b.® 

148 

12 

13 

11 

11 

34 

34 

30 

34 

35c. 

- 

25 

34 

17* 

19* 

73** 

73** 

43* 

43* 

Matrix  Square  Root  Test  Problems 
G-N  LHDElt  DI36  LSQFDQ  LSQSDI 


36a.® 

2885* 

36 

- 

- 

- 

- 

44 

44 

38 

38 

36b.® 

683* 

36 

9 

10 

16* 

- 

44 

44 

38 

38 

36c.® 

29 

40 

16 

22 

28 

28 

28 

28 

36d.® 

74* 

- 

2 

2 

Hi 

n 

424 

424 

380 

380 

2'» 


Summary  of  Rcanlts  :  Unconstrained  Optimisation  Methods 
(nambet  of  fnnction  evslastions) 


Hanson  Test  Problems 


n  m  DMIG  IPSOL  DUMB  NBA 


37. 

2 

16 

22 

22 

H 

15 

16 

17 

6 

6 

38. 

3 

16 

31 

32 

21* 

23* 

14 

14 

13 

13 

McKeown  Test  Problems 


n  m  1»IBG  BPSOL  DNBB  NBA 


39a. 

2 

3 

9 

11 

10 

11 

4 

4 

4 

4 

39b. 

2 

3 

9 

10 

9 

10 

4 

4 

4 

4 

39c. 

2 

3 

6 

7 

6 

8 

4 

5 

4 

4 

39d. 

2 

3 

8 

9 

10 

11 

6 

6 

6 

6 

39e. 

2 

3 

11 

12 

16 

17 

8 

8 

8 

8 

39f. 

2 

3 

11 

11 

28 

29 

11 

11 

11 

11 

39g. 

2 

3 

17 

18 

30 

31 

13 

14 

14 

14 

40a. 

D 

4 

11 

12 

11 

12 

4 

4 

4 

4 

40b. 

3 

4 

10 

12 

11 

12 

4 

5 

4 

4 

40c. 

3 

4 

9 

10 

9 

10 

5 

5 

5 

5 

40d. 

3 

4 

10 

11 

14 

15 

5 

6 

6 

6 

40e. 

3 

4 

11 

13 

19 

20 

7 

7 

8 

8 

4(B. 

3 

4 

14 

16 

33 

34 

10 

10 

10 

10 

40*. 

3 

4 

18 

20 

45 

46 

13 

13 

13 

13 

41a. 

5 

10 

11 

13 

12 

12 

4 

4 

4 

4 

41b. 

5 

10 

11 

13 

12 

13 

4 

4 

4 

4 

41c. 

5 

10 

13 

14 

12 

14 

8 

8 

8 

8 

4  Id. 

5 

10 

17 

20 

17 

20 

8 

9 

9 

9 

41e. 

5 

10 

24 

26 

51 

54 

11 

12 

12 

12 

4  If. 

5 

10 

27 

31 

51 

53 

14 

14 

14 

14 

Ug- 

5 

10 

32 

35 

62 

69 

17 

17 

17 

17 

30 


Summary  of  Results  t  Nonlinear  Least-Squares  Methods 
(nambet  of  function  c\'aInations) 


Hanson  Test  Problems 

G-N  LNOElt  DI36  LSQFDQ  LSQSDI 


.17, 

39 

39 

15 

21 

10 

11 

25 

25 

9 

9 

38. 

58 

58 

18 

28 

10 

12 

30 

WEM 

10 

10 

McKeown  Test  Problems 

G-N  UIDEll  DI2G  LSQFDQ  LSQSDI 


39a. 

8 

8 

5 

6 

5 

5 

17 

17 

4 

4 

39b. 

32 

32 

14 

21 

6 

7 

24 

24 

6 

6 

39c. 

23 

23 

18 

25 

7 

8 

22 

23 

9 

9 

39d. 

681 

681 

20 

28 

7 

8 

31 

32 

12 

12 

39e. 

- 

- 

28 

44 

9 

10 

32 

32 

12 

12 

39f. 

- 

- 

31 

44 

14 

15 

43 

43 

25 

25 

39g. 

- 

- 

39 

44 

18 

20 

49 

49 

39 

39 

40a. 

13 

13 

6 

9 

7 

7 

18 

18 

5 

5 

40b. 

16 

16 

14 

17 

7 

11 

19 

19 

6 

6 

40c. 

■SI 

mm 

16 

22 

9 

10 

27 

27 

11 

11 

40d. 

781 

781 

26 

40 

9 

9 

33 

34 

13 

13 

40e. 

- 

- 

■a 

146 

10 

11 

70 

72 

45 

45 

40f. 

- 

- 

180 

272 

13 

14 

92 

92 

49 

49 

40g. 

- 

~ 

1123 

319 

23 

25 

123' 

123* 

85 

85 

41a. 

5 

5 

4 

4 

4 

4 

8 

8 

4 

4 

41b. 

6 

6 

4 

5 

4 

5 

18 

18 

4 

4 

41c. 

12 

12 

6 

8 

6 

6 

21 

21 

5 

5 

4  Id. 

30 

30 

15 

22 

9 

11 

38 

38 

9 

9 

41e. 

222 

222 

29 

38 

17 

20 

47 

47 

14 

14 

4  If. 

933 

933 

57 

89 

24 

27 

54 

54 

16 

16 

4  If. 

3285 

C285 

84 

144 

29 

.30 

62 

62 

21 

21 

.11 


Summary  of  Results  :  Unconstrained  Optimisation  Methods  . 
(number  of  function  evalnslions) 


42s.<*  4  24 

42b.”  4  24 

42c.”  4  24 

42d.”  4  24 

43s.”  5  16 

43b.”  5  16 

43c.”  5  16 

43d.”  S  16 

43c.°  5  16 

43f.”  5  16 


44s.”  6  6 

44b.”  6  6 

44c.”  6  6 

44d.”  6  6 

44€.”  6  6 

45s.”  8  8 

45b.”  8  8 

45c.”  8  8 

45d.”  8  8 

45e.”  8  8 


DeVilliers  and  Glasser  Test  Problems 


103*  104* 


IPSOL 


140*  141* 

51*  52* 


28  28 
35  36 

30  31 

30  30 

22  22 
26  27 

21  21 


112*'  120*'  27*  28* 


56*  59* 


28*  29* 

17  18 


Dennis,  Gay,  and  Vu  Test  Problems 


441  444 


3726  3731 


284  288 


6197  6200 
7929  7934 
3341  3346 


■PSOL 
488  490 


1976  1978 

474  476 


1654  1656 


179  180 


194  195 
187  188 
219  220 
63  64 
15  16 
321  322 
328  329 
351  352 


30*  30* 

17*  17* 

89*  89* 


142*  143* 

37*  37* 


143  144 
46  46 

914  918 

915  916 
475  476 
186  186 
38  38 

1416  1416 
1478  1479 
1441  1441 
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Summary  of  Results  :  Nonlinear  Least-Squares  Methods 
(somber  of  fnnrtion  pvalnstions) 


DeVilliers  and  Glasser  Test  Problems 
G-N  LKDEll  DI26  LSQFOq  LSQSDI 

42a.®  51*  51*  18  19  29*  29*  73*'  73*'  60*'  64*' 

42b.®  611*  -  48*  49*  74*  74*  94*  94*  75*  75* 

42c.®  27*  27*  20*  20*  32*  32*  46*  46*  48*  48* 

42d.®  24*  24*  15*  16*  23  24  27*  27*  27*  27* 

43a.°  23*  23*  14*  IS*  31  32  33*  33*  25*  33* 

43b.®  IS*'  20*  18*  18*  20  20  45*  45*  37*  45* 

43c.®  24*  24*  11*  11*  34'  4l'  33*  33*  25*  33* 

43d.®  18*  18*  22*  23*  17  17  38*  38*  30*  38* 

43e.®  31*  31*  12*  13*  28  29  27*  27*  19*  27* 

43f.® _ 22  22  12  13  20  20  31  31  23  31 

Dennis,  Gay,  and  Vu  Test  Problems 
G-N  LRDEll  DI28  LSQFDQ  LSQSDI 

44a.®  171  171  37  38  58  59  97  97  93  95 

44b.®  5  5  5  6  7  7  10  10  6  10 

44c.®  55  55  108  109  93  94  47  47  4  7  47 

44d.®  35  35  98  99  97  98  40  40  40  40 

44e.®  42  42  82  83  83  83  47  47  47  47 

4Sa.®  171  171  47  48  65  66  97  97  93  95 

4Sb.® _ 5  5  5  6  8  8  12  12  6  12 

45c.®  41  41  164  16.5  129  130  47  47  47  47 

45d.®  35  35  144  145  168  168  42  42  42  42 

45e.®  42  42  130  131  173  173  49  49  43  43 


5.  Appendix  :  Numerical  Results 

In  this  section,  numerical  results  are  presented  for  a  large  set  of  test  problems,  using  software 
based  on  the  unconstrained  optimization  techniques  and  methods  for  nonlinear  least  squares 
problems  discussed  in  Sections  2  and  3. 


5.1  Sources  and  Presentation 

The  following  is  a  list  of  the  software  sources  that  were  used  to  obtain  the  results: 


subroutine 

source 

problem  type 

derivetives 

OHIG/SUIISOI. 

PORT 

unconstrained 

first 

IPSOL 

SOL  /  NAG 

unconstrained 

first 

Dim/BDIISOL 

PORT 

unconstrained 

second 

niA 

NPL  /  NAG 

unconstrained 

second 

G-N 

uses  SOL  /  NAG  LSSOL 

least  squares 

first 

LXDEll 

MINPACK 

least  squares 

first 

DI3G/IL2S0L 

PORT 

least  squares 

first 

LSQFDQ 

NPL  /  NAG 

least  squares 

first 

LsqsDi 

NPL  /  NAG 

least  squares 

second 

y\CM 

MTNPACK 

NAG 

NPL 

PORT 

SOI 


Association  for  Computing  Machinery 
Argonne  National  Laboratory.  U.  S.  A. 

Numerical  Algorithms  Group 
National  Physical  Laboratory,  England 

PORT  Mathematical  Software  Library,  A.  T.  it  T.  Bell  Laboratories,  Inc. 
Systems  Optimization  Laboratory,  Stanford  University 


All  of  the  programs  were  run  in  FORTRAN  using  double  precision  on  the  IBM  3081  and  IBM 
3033  computers  at  Stanford  Linear  Accelerator  Center,  for  which 

relative  machine  precision  r „  =  2.22 . . .  x  lO"’";  =  1 . -13  ...  x  10“*. 


in  the  tables,  associated  with  the  label  'ent.  err.',  we  include  the  quantity 

I  +  ii/»...ii5 


(■5.M) 


where  /*  is  the  value  of  /  at  the  point  of  termination,  and  H/aryfH;  is  tht-  best  available  estimate 
of  the  norm  of  the  solution,  in  order  to  get  some  idea  of  the  error  in  ll/'H].  For  those  problems 


Mi 


that  have  nonaero  residuals,  the  value  of  ||/a,«/||3  is  given  to  six  figures  of  accuracy,  rounded 
down. 

The  following  abbreviations  are  used  in  the  headings  of  the  tables : 

est.  err.  -  error  estimate  (R.l  .1) 

ronv.  -  termination  conditions 

The  following  abbreviations  are  used  in  the  tables  to  describe  conditions  under  which  the  algorithm 
terminates  abnormally  ; 

p  LiM.  -  fnnclion  evaluation  limit  reached 

TIME  time  limit  exceeded 

LOOP  -  subroutine  appears  to  loop 

For  information  on  the  test  problems,  see  Section  5.9. 
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5.2  Trust-Region  Methods 

(POPT/ACM  DHHH/HUMSL  and  DMHG/SUMSL) 


5.2.1  Software  and  Algorithms 

The  results  were  obtained  using  subroutines  DMMH  and  DMIG,  which  are  double  precision 
versions  of  the  ACM  algorithms  RUNSL  and  SUMSL  available  in  the  PORT  Library  [1984],  A 
subproblem  of  the  form 

min  Qk{i»)  =  ffkP 

r€**  ^ 

snbject  to 

is  solved  at  each  iteration  for  the  step  Pk  to  the  next  iterate,  where  Dt  is  a  diagonal  scaling 
matrix,  and  Hk  is  the  exact  Hessian  matrix  at  rn  in  DNIH,  and  a  quasi-Newton  approximation 
in  DMIG  (see  Sections  2.3.2  and  2.4). 


6.2.2  Pnrameters 


Parameters  were  kept  at  their  default  values  with  the  following  exceptions; 


IV(IIXFCAL)  - 

min  {9995).  1000a) 

IVCHXITER)  - 

min  {995)9,  lOOOn) 

V(AFCTOL)  - 

TOL*  ( wied;  see  tables) 

T(RFCTOL)  - 

TOL  ( juried ;  see  tables) 

V(  SCTOL)  - 

fill 

V(  XCTOL)  - 

TOL  (wried;  see  tables) 

V(  XFTOL)  - 

V(  LNAXO)  - 

usually  1.0  (default)  f 

function  r\'alnation  limit 
iteration  limit 

absolute  function  convergence  tolerance 
relative  function  convergence  tolerance 
singnlar  convergence  tolerance 
T  convergence  tolerance 
false  convergence  tolerance 
initial  trust-region  diameter 


t  In  .some  rases  the  default  V(LIIAXO)  =1.0  for  the  inili.al  diameler  of  the  trust-region  was  too 
large  and  overflow  occurred  during  function  ewluation.  These  cases  are  indicated  in  the  table 
by  giving  the  lower  value  of  TfLIfAXO)  that  was  subsequently  iiserl  to  obtain  the  results  in  the 
column  labeled  “init.  diam.''. 

Se«  Dennis,  Gay,  and  Welsch  [1981a,  1981b],  Gay  [1983],  and  PORT  [1984]  for  details 
concerning  the  parameters. 
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B.2.S  Convergence  Criteria 


The  following  quantities  will  be  used  in  describing  the  convergence  criteria 


ol*jerti\'e  function 
object  ire  gradient 
cniTcnt  step 

Newton  step 

Newton  rednetion 

predicted  redaction 
actnal  rednetion 

scaled  distance 


Tk  (=i/7/t) 

Pt,  the  ininiinirer  of  the  snhproMem 


Vh 


{Nj^^g  if  Hk  is  prMsitive  definite; 

uniltfinrd  otherwise. 

_  f  —Q.kiPn)  if  //*  i*  positive  definite; 
**  to  otherwise. 

Pr  =  -QklPk) 

Pa  =  ^k  -  J^{rk  +r») 

„  r.\  -  -  V)).|)  4 


t  Here  r,  denotes  the  ith  component  of  the  vector  r.  Tliere  is  a  provision  for  the  user  to  replace 
the  function  v;  we  nsed  the  default  in  aU  of  the  tests. 

The  convergence  criteria  used  in  DHfR  and  MOG  are  as  follows ; 

•  AbMtlate  fanction  convergence  occurs  at  xt  if 


\rk\  <  ?(*FCTOL). 

a  Hehtive  fanction  convergence  is  intended  to  approximate  the  condition 


(5.2.1) 


^k  -  ^(r*)  <  ?(RFCT0L)  |^»|. 


The  test  actually  used  is 


P„  <¥(I1FCT0L)  1^*1 . 

a  r  convergence  is  intended  to  approximate  the  condition 


(5.2.2) 


i4r4,T*,n4)<T(ICT0L), 


The  test  actually  used  is 

Pk  —  Pit  >nd  »/(r4,T» +pt./?t)  <  ¥(XCT0L).  (5.2.5) 

a  Singnlnr  convergence  is  intended  to  approximate  the  condition 

^k  -  min  {T{y)  |  \\Dk{y  -  Tt)||  <  ¥(LHAXS))  <  ¥(SCT0L)  |^t|, 

where  Pk  is  the  diagonal  scaling  matrix  at  the  il-th  iterate.  The  test  for  singular  convergence 
is  made  only  when  none  of  the  convergence  criteria  listed  above  holds.  It  is  meant  to  indicate 
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relative  function  convegence  when  the  Hessian  in  the  subproblem  is  singular. 

The  actual  test  is 

^»-min  {Gt(w)lllOt(y-»-»)||<V(U!AXS)}  <V(SCT0L)l/^i|.  {S.2.1) 

Under  certain  conditions,  the  test  (5.2.4)  is  repeated  for  a  step  of  length  T(LIUXS). 

•  Fa/se  convergence  is  returned  if  none  of  the  other  criteria  are  satisfied  and  a  trial  step  no 
larger  than  Y(XFCTOL)  is  rejected.  This  usually  indicates  either  an  error  in  computing  the 
objective  gradient,  a  discontinuity  (in  ^  or  9)  near  the  current  iterate,  or  that  one  or  more  of 
the  convergence  tolerances  (V(RFCTOL),  Y(XCTOL).  and  V(AFCTOL))  are  too  small  relative 
to  the  accuracy  to  which  the  objective  is  computed. 

The  test  actually  used  is 

^»-.F(x*+rt)<VCniIERl)/.r  •«<!  +pt,/)t)<V(XFTOL).  (5.2.5) 

where  the  parameter  T(TUIERl)  is  adjustable,  although  in  these  tests  the  default  value  0.1 
is  used  throughout. 

Except  for  (5.2.1),  tests  for  convergence  are  performed  only  when 

/»4  <  2pp.  (5.2.6) 

See  Dennis.  Gay,  and  Welsch  [1981a,  1981b],  Gay  [1983],  and  PORT  [1984]  for  more  dis¬ 
cussion  of  the  convergence  criteria. 

The  following  abbreviations  are  used  in  the  tables  to  describe  the  conditions  under  which  the 
algorithm  terminates : 

AOS.  F  -  (5.2.1) 

nw,.  F  -  (5.2.2)  anil  (5.2.6) 

X  -  (5.2.3)  and  (5.2.6) 

X,  F  -  (5.2.2)  and  (5.2.3)  and  (5.2.6) 

siNO.  -  (5.2.1)  and  (5.2.6) 

FALSB  -  (5.2.5)  and  (5.2.6) 

F  MM.  •  function  rx-aliialion  limit  reached 
TIME  -  time  limit  exceeded 

LOOP  -  snhrontine  appear*  to  loop 

The  total  number  of  Jacobian  evaluations  is  either  equal  to  the  total  number  of  iterations 
of  the  method,  or  it  is  one  more  than  the  number  of  Iterations.  The  number  in  the  column 
labeled  "iters.  /  J  evels."  is  followed  by  a  "-f"  if  an  extra  Jacobian  evaluation  was  used  in 
the  computation. 


Numerical  Results  for  DNKG 


n 

in 

TOL 

init. 

Hiam. 

/ 

rvab. 

ilcr«./ 

J  «\-ah. 

lk*ll, 

rut, 

err. 
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5.3  Quasi-Newton  (BFGS)  Linesearch  Method 

(SOL/NAG  HPSOL) 

5.3.1  Software  and  Algorithm 

The  results  were  obtained  using  subroutine  IPSOL  from  the  Systems  Optimization  Lab¬ 
oratory  (SOL),  Stanford  University,  also  available  in  the  NAG  Library.  In  IPSOL  a  search 
direction  is  determined  at  each  iteration  from  a  subproblem  of  the  form 

r  mm 

re**  2 

where  the  Hessian  matrix  is  calculated  using  the  BFGS  method  initialized  with  /  (see  Sec¬ 
tion  2.4.2).  This  is  followed  by  a  linesearch  that  uses  both  function  and  gradient  information 
to  obtain  a  steplength  along  the  search  direction  [Gill  et  al.  (1979)]. 

5.3.2  Parameters 

Parameters  were  kept  at  their  default  values  with  the  following  exceptions  t  : 

Infiaite  Bound  Size  -  10’° 

Infinite  Step  Size  10’° 

Iteration  Linit  lOOOOt 

Optiaality  Tolerance  -  varird;  see  tables 
Step  Lifldt  -  nsnnlly  2.0  (default)  t 

t  In  those  cases  (Problems  44c..  d.  (n  =  m  =  6)  and  45r.,  e.  (n  is  m  =  8)  in  which  the 
iteration  limit  was  actually  readied,  the  resnlls  listed  in  the  tables  are  taken  from  first  iteration 
in  whirh  the  number  of  fniiction  ciTinlations  reaches  or  exceeds  lOOOn. 

J  In  some  eases  the  defanit  Step  Lialt  =:  2.0  was  loo  larne  and  overflow  oeenred  dnrinz  function 
rmhiation  in  llie  lineseareh.  These  e.nses  are  indicated  in  the  tables  by  giving  the  lower  value  of 
Step  Lipit  that  was  subsequently  used  to  obtain  the  results  in  the  coin mn  labeled  “Step  Lin". 

See  Gill  et  al.  [1986]  for  details  concerning  the  parameters 

5.3.3  Convergence  Criteria 

The  following  quantities  will  be  used  in  describing  the  convergence  criteria  : 

objective  function  Tt  \  fi) 

objective  gradient  jy*  =  F/i  {=  Jj ft) 

optimality  tolerance 


The  sequence  of  iterates  generated  by  IPSOL  is  judged  to  have  converged  if  the  following 
two  conditions  hold : 


and 

IIpsII,  <  ,^(1  +max  {(1  +l^t|)|L<jr|l,})  (.■>.3.2) 

or  if 

IIpWI,  <  C(i  +  {(I  + IIp^II,)). 

Condition  (.5.3.1)  is  meant  to  ensure  that  the  sequence  {n)  has  converged,  while  conditions 
(.5.3.2)  and  (.5.3.3)  are  intended  to  test  whether  the  requirement  that  the  gradient  vanish  is 
approximately  satisfied  at  r>.  Condition  (.5.3.3)  allows  IPSOL  to  accept  a  point  as  a  local 
mimimum  if  a  more  restrictive  test  on  the  necessary  condition  than  (.5.3.2)  is  satisfied,  but 
condition  (.5.3.1)  does  not  hold.  For  a  detailed  discussion  of  convergence  criteria  similar  to 
these,  see  Section  8.2  of  Gill,  Murray,  and  Wright  [1981]. 

The  following  abbreviations  are  used  in  the  tables  to  describe  the  conditions  under  which  the 
algorithm  terminates  :  t 

OPT.  -  optimal  point  fonnd 
•  -  cnrrent  point  cannot  he  improved 

**  *  optimal  solution  fonnd,  hut  requested  aernracj’  could  not  he  achieved 

F  tTM.  -  function  evaluation  Umit  reached 

t  A  ■*’  corresponds  to  the  situation  in  which  the  algorithm  terminates  dne  to  failure  in  the 
linesearch  to  find  an  acceptable  step  at  the  cnrrent  iteration.  A  '**’  rvrrnrs  adien  condition 
(5.3.1)  is  satisfied  but  not  condition  (5.3.2)  ;that  is,  conditions  for  optimality  are  met  at  the 
current  point  hut  the  iterates  have  not  yet  converged. 
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Numerical  Results  for  NPSOL 
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5.4  Second'Derivative  (Modified- Newton)  Linesearch  Method 

(NPL/NAG  HIA) 

5.4.1  Software  and  Algorithm 

The  results  were  obtained  using  subroutine  HIA  from  the  National  Physical  Laboratory, 
available  at  Stanford  Linear  Accelerator  Center.  The  algorithm  implements  a  modified  New¬ 
ton  method  in  which  the  search  direction  at  each  iteration  is  the  solution  to  a  subproblem 
of  the  form 


and  the  exact  Hessian  matrix  is  replaced  by  modified  Cholesky  factors  if  it  is  either  indefinite 
or  computationally  singular  (see  Gill  and  Murray  [1974a]  and  Section  2.4.1).  A  step  length 
along  the  search  direction  is  then  computed  by  a  linesearch  method  [Gill  and  Murray  (1974b)] 
that  uses  both  function  and  gradient  information  to  obtain  sufficient  decrease  in  the  objective 
function.  MIA  requires  exact  second  derivatives,  and  is  similar  to  subroutine  E04LBF  from 
the  NAG  Library  [1984],  the  principal  difference  being  that  the  latter  allows  specification  of 
fixed  upper  and  lower  bounds  on  the  variables. 

5.4.2  Parameters 

Parameters  were  kept  at  their  default  values  with  the  following  exceptions  : 

HAZCAL  -  min  {9999,  lOOOw}  function  cvnhintion  limit 
ZTOL  varied;  see  tables  accuracy  in  x 

ETA  0.9  lineseairli  armracy 

STEPHX  -  nsnally  10*  (defanit)  f  maxiiimm  step  for  linesearch 

t  In  some  cases  the  default  STEPHX  =  10*  was  too  large  and  m’erllow  orrnrred  during  function 
evaluation  in  the  linesearch.  These  ca.ses  are  indicated  in  the  table  by  giving  the  lower  \7ilne  of 
STEPHX  that  was  subsequently  used  to  obtain  the  results  in  the  column  labeled  “max.  step". 

See  NAG  [1984]  for  details  concerning  the  parameters. 
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S.4.8  Convergence  Criteria 


The  following  quantities  will  be  used  in  describing  the  convergence  criteria  : 

ohjfctiv^  function  /t  ( =  ^  ) 

objective  gradicnl  /j|,  =  T/*  (=  J^/*) 
search  direction  /»t.  the  miniiitraer  of  the  siibproWem 

steplength  ,  determined  by  the  linesearch 

An  iterate  is  determined  to  be  optimal  by  KIA  if  the  following  four  conditions  hold; 


o^||p^||,(xm-^v^T)(l  +  l|l•^||,) 

(.5.4.1) 

and 

-  /■*  <  (XTOL*  -hf„)(l  +  1^*1) 

(5.4.2) 

and 

||il*||,  <(xm-hel/*)(l-h|^t|) 

(5.4.3) 

and 

)(  positive  definite. 

(5.4.4) 

IIpsII,  <  0.01^. 

(5.4.5) 

A  necessary  condition  for  optimality  is  that  the  gradient  vanish,  and  conditions  (5.4.3)  and 
(5.4.5)  are  intended  to  test  whether  this  requirement  is  approximately  satisfied  at  r^.  Con¬ 
ditions  (5.4.1 )  and  (5.4.2)  are  meant  to  ensure  that  the  sequence  has  converged,  while 
condition  (5.4.4),  together  with  condition  (5.4.3),  implies  that  sufficient  conditions  for  a 
strict  local  minimum  appear  to  hold  at  rt.  Condition  (5.4.5)  allows  HIA  to  accept  a  point  as 
a  local  mimimum  if  a  more  restrictive  test  than  (5.4.1 )  on  the  necessary  condition  is  met,  but 
one  or  more  of  the  other  conditions  for  convergence  do  not  hold.  For  a  detailed  discussion 
of  convergence  criteria  similar  to  these,  see  Section  8.2  of  Gill,  Murray,  and  Wright  [1981]. 

The  following  abbreviations  are  used  in  the  tables  to  describe  the  conditions  under  which  the 
algorithm  terminates : 

OPT.  -  optimnl  point  found 
•  -  current  point  cannot  be  improved  t 

p  LtM.  •  function  evaluation  limit  reached 
TIME  -  time  limit  exceeded 

t  A  corresponds  to  the  situation  in  w'hirh  the  algorithm  terminates  dne  to  failure  in  the 
linesearch  to  find  an  acceptable  step  at  the  current  iteration. 
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5.5  Gauss-Newton  Methods 


5.5.1  Software  and  Algorithm 

The  software  package  LSSOL  [Gill  et  al.  (1986a)]  is  used  to  solve  the  linear  least-squares 
subproblem  (.t.1.3).  The  linesearch  procedure  used  for  the  numerical  examples  in  this  section, 
requires  both  function  and  gradient  information.  It  is  taken  from  the  nonlinear  programming 
code  IPSOL  [Gill  et  al.  (1979);  (1986b)]. 


5.5.2  Parameters 

Parameters  in  LSSOL  were  kept  at  their  default  values  with  the  following  exceptions  : 

Rank  Tolexanea  -  vnricd,  m'c  tabim 
Infinite  Bound  Size  -  to*" 

See  Gill  et  al.  [1986a]  for  details  concerning  the  parameters. 

In  addition,  the  following  parameters  are  chosen  for  the  linesearch  ; 

n  -  0.5 

min  {(100(1 +  |)r||,)-H)/l|p||,.10»")  t 

t  In  some  cases  the  defanlt  wine  was  too  large  and  os-erflow  occurred  during  function 
c^'al^alion  in  the  linesearch.  These  cases  are  indicated  in  the  tables  by  giving  the  value  7  <  100 
such  that  ==  min{(7(t  +  HrH,)  +  l)/||rii7<  IH^))  snbseqnently  used  to  obtain 

the  results  in  the  column  labeled  “step  fac.”. 

See,  e.  g..  Gill,  Murray,  and  Wright  [1981]  for  a  discussion  of  the  linesearch  parameters. 


5.5.3  Convergence  Criteria 


Convergence  is  judged  to  have  occurred  at  the  l-th  iterate  if  either 


IIMI,  < 


Il.«nll,<r^(’(i+1IA1I,). 

The  algorithm  is  also  terminated  if  there  is  a  negligible  change  in  r, 


(.5..'>.1) 


(5.5.2) 


no 


(5.5.3) 


where  <>»  is  the  step  length  determined  by  the  iinesearch. 


5.5.4  Table  Information 

Under  the  label  'conv.'.  the  following  notation  is  used  to  describe  conditions  under  which 
the  algorithm  terminates  : 


AB5.  P  - 

(5.5.1) 

a 

(5.5.2) 

X 

(5.5.5) 

P  LIM.  - 

function  evaluation  limit  reached 

A  superscript  ®  following  a  problem  number  indicates  a  zero-residual  problem. 

A  superscript  following  a  problem  number  denotes  a  linear  least-squares  problem. 
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5.6  Levenberg’Marqunrdt  Method 

(MIXPACK  LKDER) 

5.6.1  Software  and  Algorithm 

The  results  were  obtained  using  the  MINPACK  subroutine  LMDER,  which  implements  a 
Levenberg-Marquardt  method  using  exact  derivative  information.  A  subproblem  of  the  form 

rain  fitfp)  s  2gjp  + 

•subject  to  ||/7*Hl3  ^ 

is  solved  at  each  iteration  for  the  step  pt  to  the  next  iterate,  where  Dk  is  a  diagonal  scaling 
matrix. 

5.6.2  Parameters 

The  results  were  obtained  using  the  MINPACK  subroutine  LMDER,  with  the  following 
input  parameters  : 

XTOL  -  varied,  see  tnhies  arrnrney  in  r 

FTOL  -  ^•aried,  see  tables  aernracy  in  sum  of  sqnares 

6T0L  -  0.00  gradient  norm  tolerance 

NAXFEV  -  min  {0000,1000*  »}  fnnri ion  evalnal ion  limit 

MODE  -  1  speciRes  internal  scaling 

FACTOR  -  100.  (defanlt)  initial  step  magniRcation 

t  In  .some  cases  the  defanlt  FACTOR  =  100.0  was  too  large  and  overflow  occurred  during  function 
es'nlnalion.  These  ca.ses  are  indicated  in  the  table  by  giving  the  lower  valne  of  FACTOR  that  was 
subsequently  nsed  to  obtain  the  results. 

For  details  about  these  parameters,  MorA,  Garbow,  and  Hillstrom  [1980], 


5.6.S  Convergence  Criterin 


The  following  quentities  will  be  used  in  describing  the  convergence  criteria  ; 

rcKidiml  vector  fi^t) 

ith  residnal  gradient 
■Tacobian  matrix  ./(tj  ) 

objective  function  = /(rt)'''/(rt) 

objective  gradient  gt  =  V^(Tt)  =  2.J{rt)'^ /(r*) 

cnirent  step  the  minimizer  of  the  snbproblem 

predicted  reduction  o-  =  'i-Alb  ~  ^1'^* 

ItMiz  HAH, 

actual  reduction  n.  =  H/(^* )Ha  ~  'I'  /** ill,  _  )  ~  +  Pt) 

HAH,  "  HAH, 

Criteria  for  termination  of  UfDER  at  rt  are  as  follows ; 

•  T  coiMcrgence.  Both  actual  and  predicted  reductions  in  the  sum  of  squares  are  at  most 
FTOl. 

|/•A|<FT0L  and  pp<fT0L  and  Pa  <  2pr  (5.6.1) 

This  attempts  to  guaranty  that 

HAH,  <(1+^T0L)||/(r*)H,• 

#  I  con»rrgence.  Relative  error  between  two  consecutive  iterates  is  at  most  XTOL. 

<XT0L||rt  +  p»||,  (5.6.2) 

This  attempts  to  guarantee  that 


-  r*)||j  <  XTl)L||/?t(r‘)||y. 

•  The  cosine  of  the  angle  between  /*  and  any  column  of  ./r  is  at  most  GTOL  in  absolute 
value. 


I  )■’■/*  1 

nJ, ;  ; — ,7  <  6T0L 

i<f<™  ||^<^>(*'»)II,IIAH, 


(5.6.3) 


This  approximates  the  necessary  condition  g(rt)  =  0. 

•  FTOL  is  too  small.  No  further  reduction  in  the  sum  of  squares  is  possible. 


Ip-iI  <  f»4  *nd  pr  <  f„  and  p^  <  2p, 


(5.6.4) 
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•  XTOL  is  too  small.  No  further  improvement  in  the  approximate  solution  is  possible. 


+  ptilj 


(5.6.5) 


•  6T0L  is  too  small.  is  orthogonal  to  the  columns  of  .It  to  machine  precision. 

Except  for  test  (5.6.3),  tests  for  convergence  are  performed  only  when 


(5.6.6) 


/I4  <  o.oooip^.  (5.6.7) 

The  convergence  criteria  are  described  in  more  detail  in  Mori,  Garbow,  and  Hillstrom  [1980]. 
The  following  abbreviations  are  used  in  the  tables  to  describe  the  conditions  under  which  the 
algorithm  terminates ; 


r  -  (5.6.1)  and  (5.6.7) 

X  -  (5.6.2)  and  (5.6.7) 

X.  r  -  (5.6.1)  and  (5.6.2)  and  (5.6.7) 
a  .  (5.6.6)  and  (5.6.7) 

r  LlM.  •  function  cvalnation  limit  reached 
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n 
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5.7  Adaptive  Special  Quasi>Newton  Method 

(rORT/ACM  DM2G/ML2S0L) 

5.7.1  Software  and  Algorithm 

The  results  were  obtained  using  subroutine  DI2G,  a  double  precision  version  of  the  ACAf 
algorithm  IL2S0L  available  in  the  PORT  Library  [1984],  A  subproblem  of  the  form 

mjn  Qk{p)=9jp+]rP^{Jk  h  +  Bk)p 

r€“"  2 

snhjwt  to  IlZ>tp|l5  < 

is  solved  at  each  iteration  for  the  step  pk  to  the  next  Iterate,  where  /)»  is  a  diagonal  scaling 
matrix.  The  method  is  adaptive,  so  that  Bk  is  sometimes  null  and  sometimes  a  scaled  quasi- 
Newton  approximation  to  the  part  of  the  Hessian  involving  the  second  derivatives  of  /. 

5.7.2  Pturameiers 

Parameters  were  kept  at  their  default  values  with  the  followinj^ exceptions; 


IV(IfZFCAL) 
IVCNZITER} 
?(AFCT0L) 
TCRFCTOL) 
V(  SCTOL) 
VC  ZCTOL) 
VC  XFTOL) 
VC  UUZO) 
VC  LMAZS) 
VCTUIERl) 


min  {9<)99, 1000«n} 
min  {9999, 1000  ♦  w} 
T0L*T0L  (varied;  see  tables) 
TOL  (varied;  see  tables) 

TOL  (varied;  see  tables) 

nsnally  1.0  (defanll)  t 
1.0  (defanll) 

0.1  (default) 


function  evalnation  limit 
iteration  limit 

absolute  function  convergence  tolerance 
relative  function  convergence  tolerance 
singular  convergence  tolerance 
T  convergence  tolerance 
false  convergence  tolerance 
initial  trust-region  diameter 
step  bound  ff»r  singular  convergence  test 
red  net  ion  test  coelRcienl 


I  In  some  cases  the  default  VCLRIZO)  =1.0  for  the  initial  diameter  of  the  trust-region  s-as  too  large 
and  overflow  occurred  during  function  cmliiation.  These  cases  arc  indicated  in  the  tabic  by  giving 
the  lower  value  of  V(LI(4X0)  that  was  subsequently  used  to  obtain  the  results  in  the  column  labeUd 
“init.  diam.”. 

See  Dennis,  Gay,  and  Welsch  [1981a,  1981b].  Gay  [1983],  and  PORT  [1984]  for  details  concerning 
the  parameters. 


5.7.3  Convergence  Criteria 


The  following  quantities  will  be  used  in  describing  the  convergence  criteria  : 


objective  function 
objective  giftdient 
current  step 

Newton  step 

Newton  reduction 

prrHlicted  reduction 
actual  reduction 

scaled  distance 


=  Slh 

g,  =  Vjr,  = 


Pi,  the  minimizer  of  the  subproblem 


Pn 


Ph 


if  Hk  is  positive  definite; 
undefined  otherwise. 


-CafPw)  if  Ifk  is  positive  definite; 

r\l  ImriiMSA 


pp  =  -Qk(Pk) 

Pa=  -  T{2k  +  Pk ) 


v(T.y,D) 


inaxi<,<,  {|(/?(t  -  y)V|} 
maxi<,<,  {|(Dr)'|  +  KDirVI}’ 


t  Here  t*,-  denotes  the  tth  component  of  the  vector  n.  There  is  a  provision  for  the 
replace  the  function  v,  we  used  the  default  in  all  of  the  tests. 


The  convergence  criteria  used  In  DI26  are  as  follows: 
e  Absolute  function  convergence  occurs  at  ra  if 


|.r*|  <  »(iiFcm). 


•  Behtive  function  convergence  is  intended  to  approximate  the  condition 

J^k  -  ^(r,)  <  f(»FCT0L)  l^ai . 


The  test  actually  used  Is 

p„  <  fCEFCTOL)  |7il. 

•  r  convergence  is  intended  to  approximate  the  condition 

i/(T»,/*,Da)<  v(XCTOL). 


The  test  actually  used  is 


Pk=Ps  and  +  pa,D*)  <  T(XCTOL). 

e  Singulnr  convergence  is  intended  to  approximate  the  condition 

Tk  -  min  j  |lDa(y  -  xa)||  <  f(LliAXS)}  <  f(SCTOL)  [^k]. 


n.ser  to 


(5.7.1) 


(5.7.2) 


(5.7.3) 


where  D),  is  the  diagonal  scaling  matrix  at  the  il-th  iterate  —  when  none  of  the  convergence 
criteria  listed  above  hold.  It  is  meant  to  indicate  relative  function  convegence  when  the  Hessian 
in  the  subproblem  is  singular. 

The  actual  test  is 

-  min  {C>*(y)  I  \\Dk{y  -  J-*)!!  <  V(LWXS)}  <  »(SCT0L)  \Ti,\.  (5.7.4) 

Under  certain  conditions,  the  test  is  repeated  for  a  step  of  length  V(LNAXS). 

•  Fa/$o  convergence  is  returned  if  none  of  the  other  convergence  criteria  are  satisfied  and  a  trial 
step  no  larger  than  V(XFCTQL)  is  rejected.  This  usually  indicates  either  an  error  in  computing 
the  objective  gradient,  or  a  discontinuity  (in  T  or  near  the  current  iterate,  or  that  one  or  more 
of  the  convergence  tolerances  (V(RFCTOL),  V(XCTOL),  and  V(AFCTOL))  are  too  small  relative 
the  accuracy  to  which  the  objective  is  computed. 

The  test  actually  used  is 

-  ^{fk  +Pk)<  »(T0IE11)/*,.  and  H^k,  +  Pk ,  Dk)  <  TflFTOL),  (5.7.5) 

where  the  parameter  T(TOIERl)  is  adjustable,  although  in  these  tests  the  default  value  0.1  is 
used  throughout. 

Except  for  test  (5.6.13),  tests  for  convergence  are  performed  only  when 

Pa  <  2/»r-  (5.7.6) 

See  Dennis,  Gay,  and  Welsch  [1981a,  1981b],  Gay  {1981).  and  PORT  [1984]  for  more  discussion 
of  the  convergence  criteria. 

The  following  abbreviations  are  used  in  the  tables  to  describe  the  conditions  under  which  the 
algorithm  terminates ; 

ABS  p  -  (5.7.1) 

REL.  F  -  (5.7.2)  anrl  (5.7.6) 

X  -  (5.7.3)  and  (5.7.6) 

x.F  -  (5.7.2)  and  (5.7.3)  and  (5.7.6) 

5ING.  -  (5.7..1)  and  (5.7.6) 

FALSE  -  (5,7.5)  and  (5.7.6) 

F  LiM.  -  function  evaluation  limit  rpached 
TIME  -  time  limit  exceeded 

LOOP  -  siihrontine  appears  to  loop 

The  total  number  of  Jacobian  evaluations  is  either  equal  to  the  total  number  of  iterations  of  the 
method,  or  it  is  one  more  than  the  number  of  iterations.  The  number  in  the  column  labeled  “iters. 
/  .7  evals."  is  followed  by  a  if  an  extra  Jacobian  evaluation  was  used  in  the  computation. 
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5.8  Corrected  Gauss-Newton  Methods 

(NPL/NAG  LSQSDN  and  LSQFDQ) 


5.8.1  Software  and  Algorithms 

The  results  were  obtained  using  subroutines  LSQSDI  and  LSQFDQ  implementing  corrected 
Gauss- Newton  methods  from  the  National  Physical  Laboratory,  which  are  available  at  Stan¬ 
ford  Linear  Accelerator  Center.  A  subproblem  of  the  form 

min  ajp  +  i  h  +  Bk  )p 

subject  to  fipas  -h< 

is  solved  for  a  search  direction  Pk,  where  a*  is  interpreted  in  a  least-squares  sense  using 
the  singular-value  decomposition  (see  Chapters  3  and  4).  Subroutine  LSQSDI  requires  exact 
second  derivatives  for  the  term  Bk  that  involves  the  second  derivatives  of  the  residuals, 
while  LSQFDQ  uses  a  quasi-Newton  approximation.  The  linesearch  algorithm  used  within  the 
subroutines  requires  both  function  and  gradient  information  (see  Gill  and  Murray  [1974], 
for  details).  These  subroutines  are  similar  to  those  available  in  the  NAG  Library  [1984]  for 
solving  nonlinear  least-squares  problems  :LSQSD1  corresponds  to  NAG  subroutine  E04HEF 
and  LSQFDQ  to  NAG  subroutine  E046BF. 


5.8.2  Pnrmeters 


LSQSDI  and  LSQFDQ  have  the  same  set  of  input  parameters  as  the  corresponding  software 
from  the  NAG  Library  [1984].  The  values  chosen  are  listed  below. 


NAXCAL 

ZTOL 
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STEPKZ 


min  [flflfW,  1000 •  n) 
iTiried:  see  tables 
0.9 

usually  10*  (default)  t 


function  cvalnation  limit 
arm  racy  in  r 
linrscarrh  acniracy 
maximum  step  for  linrscarrh 


t  In  some  cases  the  default  STEPIX  =  10*  was  too  large  and  overflow  occurred  during  function 
evaluation  in  the  linesearch.  The.se  cases  are  indicated  in  the  tables  by  giving  the  lower  value  of 
STEPMX  that  was  subsequently  used  to  obtain  the  results. 

See  the  NAG  [1984]  manual  for  details  concerning  the  parameters. 
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5.8.3  Convergence  Criteria 


The  following  quantities  will  be  used  in  describing  the  convergence  criteria  ; 


objective  function 
objective  gradient 
search  direction 
steplength 


T,  =  /7a 

=  r/v  =  2J7A 

pi-,  the  mininiirer  of  the  snbproblem 
o  j,  determined  by  the  linesearch 


An  iterate  is  determined  to  be  optimal  by  LSQSDI  and  LSQFDQ  if 


(5.8.1) 

or 

\\9k\\,  <  (mHM!,  (5.8.2) 

or  if  the  following  three  conditions  hold  ; 

«*l|pt||,  <  (XTOL  +  e,.)(l  +  llrtll,)  (5.8.3) 

and 

-Ti<  (XTOL  +  c„)’(l  +  mi)  (5.8.4) 

and 

llQ*||,<fl^’(l  +  mi).  (5.8.5) 


Conditions  (5.8.3)  and  (5.8.4)  are  meant  to  ensure  that  the  sequence  has  converged, 
while  conditions  (5.8.2)  and  (5.8.5)  are  intended  to  test  whether  the  necessary  condition 
that  the  gradient  vanish  at  a  minimum  is  approximately  satisfied  at  n.  Condition  (5.8.3) 
allows  the  algorithm  to  accept  a  point  as  a  local  mimimum  if  a  more  restrictive  test  on  the 
necessary  condition  than  (5.8.5)  is  met.  even  if  conditions  (5.8.3)  and  (5.8.1)  do  not  hold. 
For  the  zero-residual  case,  condition  (5.8.5)  specifies  that  the  method  may  also  terminate 
when  IIAII]  is  no  larger  than  the  relative  machine  precision.  For  a  detailed  discussion  of 
convergence  criteria  similar  to  these,  see  Sections  8.2  and  8.S  of  Gilt,  Murray,  and  Wright 
[1981].  In  particular.  Section  8. 5. 1.3  treats  special  considerations  relevant  to  nonlinear  leas' 
squares. 
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The  following  abbreviations  are  used  in  the  tables  to  describe  the  conditions  under  which  the 
algorithm  terminates  : 

orT  -  optimal  point  found 

*  -  rnrronl  point  raiir.ot  he  improvcrl  t 

p  I,IM  function  evaliitaion  limit  reached 

TIMK  -  (iinc  limit  exceeded 

t  A  corresponds  to  the  situation  in  which  the  algorithm  terminates  due  to  failure  in  the 
linesearch  to  find  an  acceptable  step  at  the  current  iteration. 


m 


r 

i 

Numoriral  Rcpulfs  f<ir  LSQFDQ 
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OPT 

II 

II 

18 

17 

10"* 

10"® 

10"'* 

10"'* 

OPT. 

14.® 

4 

6 

10~* 

87 

12 

2.00 

10"* 

10"* 

10"'* 

OPT 

■OH 

93 

45 

2.00 

10"* 

10"* 

10"'* 

* 

15. 

4 

11 

16 

n 

10"* 

10"'^ 

10"® 

OPT 

10"'* 

16 

H 

K1 

10"* 

10“'* 

10"® 

OPT 

16. 

4 

45 

22 

17.6 

10* 

10"* 

10"* 

* 

10"'* 

45 

22 

17.6 

10* 

10"* 

10"* 

♦ 

17. 

5 

33 

14 

H 

m 

10"* 

10"* 

10"" 

OPT 

18 

U 

10"* 

10"® 

10"" 

* 

18.® 

6 

13 

243 

Tf 

10"’* 

10"" 

10-** 

OPT 

247 

TIM 

10"’* 

10"" 

10-** 

p 

19. 

11 

65 

19 

10 

9..18 

10"' 

10"® 

10"* 

OPT. 

19 

10 

9.38 

10"' 

10"® 

10"* 

OPT 
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Niimeriral  Rosnlts  for  LSQSDN 


n 

ni 

XTOL 

IDAX. 

step 

LJ 

cvals. 

il<*rs. 

lk*ll: 

ll/'ll. 

Il9*ll, 

est. 

err. 

conv. 

20a. 

fi 

11 

lO'* 

9 

2.14 

10-" 

10-* 

10-10 

* 

10“'* 

9 

7 

2.44 

10“* 

10-* 

10- "’ 

* 

20)>. 

9 

31 

10~* 

6 

5 

6.06 

10-* 

10- " 

10-'* 

orT. 

10~'* 

G 

5 

6.06 

10-* 

10- " 

10-'* 

OPT 

20c. 

12 

31 

10"* 

6 

5 

16.6 

10-* 

10"'* 

10-'® 

OPT 

10"'* 

6 

5 

16.6 

10-* 

10-'* 

10-'® 

OPT 

20d. 

20 

31 

10 

8 

EH 

10-10 

10-" 

10-"* 

OPT. 

12 

9 

10-'*’ 

10- " 

10“** 

OPT. 

21a.'’ 

10 

10 

31 

12 

3.16 

0.00 

0.00 

OPT. 

10-’" 

31 

12 

.3.16 

0.00 

0.00 

OPT. 

21b.'' 

20 

20 

UM 

■B 

B9 

■H 

4.47 

0.00 

0.00 

0.00 

OPT 

IbO 

B9 

■9 

4.47 

0.00 

0.00 

0.00 

OPT, 

22a.'’ 

12 

12 

18 

17 

io-‘ 

10-" 

10-'* 

10-'* 

OPT. 

18 

17 

10-* 

10-" 

10-'* 

10-'* 

OPT. 

22b.'’ 

20 

20 

HRSI 

18 

17 

10-* 

10-" 

10-’* 

10-'* 

OPT. 

CB 

18 

17 

10-* 

10“" 

10-'* 

10-’* 

OPT. 

23a. 

4 

5 

10-* 

58 

32 

10-* 

10-" 

10-'° 

OPT, 

10-'" 

58 

32 

.500 

10-* 

10- " 

lO-’o 

OPT. 

23b. 

10 

11 

■H 

mvm 

64 

.500 

10-* 

10- '» 

10-’’ 

OPT. 

BUB 

64 

.500 

10-* 

10-’* 

10-’’ 

OPT. 

24a. 

n 

136 

.759 

10-* 

10- " 

10- " 

OPT 

m 

136 

.759 

10-* 

10-" 

10-’’ 

OPT. 

24b. 

10 

20 

HI 

88 

.598 

10-" 

10- " 

10'" 

OPT. 

■Hi 

88 

.598 

10-’ 

10- " 

10'" 

OPT. 

25a.‘’ 

10 

12 

12 

10 

.3.16 

10-" 

10-* 

10-’* 

OPT. 

17 

12 

3.16 

10-* 

10-* 

10-'* 

• 

25b.'’ 

20 

22 

10"* 

14 

12 

4.47 

10"* 

10-" 

10-’" 

OPT. 

10-'" 

19 

14 

4.47 

10-" 

10- " 

10-'" 

0 

2ea.'' 

in 

10 

18 

9 

.306 

10-" 

10-" 

10-"» 

OPT. 

22 

11 

.306 

10- " 

10-” 

10-"" 

« 

2Cb.'’ 

20 

20 

,0-'" 

10- '0 

10-'" 

OPT. 

HI 

10"  "’ 

10- "’ 

10-'" 

0 

2ra.'’ 

10 

10 

10-" 

10" 

22 

7 

.3.18 

10-in 

10-" 

10- JO 

OPT. 

10-'" 

lo" 

28 

10 

.3.18 

lO-’o 

10-" 

10- JO 

• 

2rb.'’ 

20 

20 

10-" 

10" 

21 

/ 

4.47 

10-" 

10-" 

10-'" 

OPT 

10-’" 

10" 

27 

10 

4,47 

10-" 

10-" 

10-'" 

* 

28a.'’ 

10 

10 

10-" 

BH 

mm 

10-'* 

10-'" 

10-*' 

OPT. 

10-'" 

HH 

wa 

10"'* 

10-'" 

10-*' 

OPT. 

281).'’ 

20 

20 

10"* 

4 

3 

.571 

10-'" 

10- 1« 

10-*" 

OPT. 

10-'" 

4 

3 

.571 

10-'" 

10-'® 

10-** 

OPT, 
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Numrriral  for  LSQSDN 


n 

fll 

XTOL 

max. 

st^p 

oval*. 

iter*. 

n 

11.7*11, 

rst. 

err. 

conv. 

20a.'’ 

10 

10 

10“* 

mm 

10"” 

10-'* 

10-*® 

OPT. 

I0~’* 

■ta 

10-” 

10-'* 

10-*® 

OPT. 

29b.’ 

20 

20 

10-* 

6 

4 

.5T1 

10-” 

10-'* 

10-’* 

OPT. 

10"'* 

6 

4 

.571 

10-” 

10-'* 

10-’* 

OPT 

30a.° 

10 

10 

10-* 

7 

10-® 

10-® 

10-'« 

OPT 

10-’’ 

11 

10-® 

10-* 

10-'* 

aob.” 

20 

20 

10-* 

1 

5 

3.04 

10-* 

10-® 

10-'* 

OPT. 

10- ” 

11 

7 

3.04 

10-® 

10-* 

10"'* 

« 

31a." 

10 

10 

I0-* 

8 

6 

1.80 

10-* 

10-’ 

10-16 

OPT 

O 

1 

12 

8 

1.80 

10-* 

10-’ 

10-*« 

p 

31b.‘’ 

20 

20 

10-* 

8 

6 

2.G6 

10-* 

10-’ 

10-'* 

OPT 

10-” 

12 

8 

2.66 

10-* 

10"^ 

10-'* 

p 

32.*^ 

10 

20 

10-* 

2 

1 

.1.16 

>0® 

10-'* 

10-** 

OPT. 

1 

O 

2 

1 

3.16 

10® 

10"'* 

10"'* 

OPT. 

33.' 

10 

20 

10“* 

2 

1 

1.16 

10® 

10-* 

10-* 

OPT 

10-” 

2 

1 

1.16 

10® 

10-® 

10"* 

OPT. 

34.'- 

10 

20 

10-* 

2 

1 

1.78 

10® 

10"* 

10"* 

OPT. 

10-” 

2 

1 

1.78 

10® 

to-® 

10"* 

OPT. 

35a. 

8 

8 

10-* 

Kl 

19 

1.65 

10-' 

10-’ 

10"® 

p 

O 

1 

Bui 

19 

1.65 

10-' 

10-’ 

10"® 

p 

35b.‘’ 

9 

9 

10-* 

30 

12 

1.73 

10- " 

10"*® 

10-” 

OPT. 

O 

1 

M 

34 

14 

1.73 

10- " 

10-*® 

10"” 

p 

35c. 

10 

10 

10-* 

•13 

10 

1.81 

10-' 

10"*® 

10-* 

OPT. 

10- ” 

43 

10 

1.81 

10-' 

10"*® 

10-* 

OPT. 

36a.‘’ 

4 

4 

10-* 

38 

10"'* 

10-'* 

10-*' 

OPT- 

10-” 

38 

10-'* 

10"'* 

10-*' 

OPT. 

36b.® 

9 

9 

10- * 

38 

22 

10-'* 

10-** 

10-** 

OPT. 

10- ” 

38 

22 

10-'* 

10"** 

10-*' 

OPT. 

36c.® 

9 

9 

10-" 

MM 

1.73 

10-'* 

10"'* 

10-*’ 

OPT. 

10-” 

WM 

1.73 

10"'® 

10-'* 

10-*’ 

OPT. 

36d." 

9 

9 

10-* 

84 

2.13. 

10-'" 

10"® 

10-’® 

OPT 

10-” 

380 

84 

2.13. 

10-'® 

10"® 

lo-*® 

OPT 

37. 

2 

16 

10-* 

9 

8 

8.85 

10* 

10-'® 

10"* 

OPT 

10-” 

9 

8 

8.85 

10' 

10-'® 

10"* 

OPT 

38. 

3 

16 

10-* 

10 

8 

EH 

10' 

10-® 

10"* 

OPT. 

I0-” 

10 

8 

MM 

10' 

10-® 

10"* 

OPT. 
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Numerical  Results  for  LSQSDN 


til 

XTOL 

max. 

%lep 

cmb. 

iter*. 

Ik-ll, 

11.7*11, 

ost. 

err. 

ronv. 

39a. 

2 

3 

10~" 

4 

3 

10“® 

10-' 

10-* 

orT 

o 

1 

4 

3 

10-* 

10-' 

10-* 

OPT 

39b. 

2 

3 

lO'* 

10"* 

OPT 

10-'* 

10-* 

OFT. 

39c. 

2 

3 

10-* 

9 

5 

10-* 

10-' 

10-* 

OPT 

10-'* 

f> 

5 

10-* 

10-' 

10-* 

OPT 

39d. 

2 

3 

10-* 

12 

7 

10-* 

10"' 

10-'® 

10“* 

OPT 

10-'* 

12 

7 

10"* 

10-' 

10-'® 

10“* 

OPT 

39«. 

2 

3 

10-* 

12 

7 

10-* 

10-' 

10“* 

OPT 

10-'* 

12 

7 

10-* 

10-' 

10"* 

OPT. 

39f. 

2 

3 

10-* 

25 

to 

10-' 

10-'* 

10“* 

OPT 

10-'* 

25 

10 

10-' 

10-'* 

10-* 

OPT 

99g. 

2 

3 

10-* 

39 

15 

10-'® 

10-' 

10-* 

10“* 

* 

10"'* 

39 

15 

10-'" 

10-' 

10-* 

10“* 

40a. 

.3 

4 

10“* 

5 

4 

10-* 

10® 

10-* 

OPT, 

10-'* 

5 

4 

10-® 

10® 

10“* 

OPT. 

40b. 

3 

4 

10-* 

6 

4 

10-* 

10® 

10“'® 

10"* 

OPT. 

o 

1 

M 

6 

4 

10-* 

10® 

10-'® 

10-* 

OPT. 

40c. 

3 

4 

10-* 

11 

10-* 

OPT. 

7 

o 

11 

10-* 

OPT. 

40d. 

3 

4 

10-* 

13 

7 

10- * 

10® 

10-16 

OPT. 

10-'* 

13 

7 

10-* 

10® 

10-'® 

OPT 

40c. 

3 

4 

10-* 

45 

16 

10-* 

10® 

10-'* 

10-* 

OPT. 

7 

o 

45 

16 

10-* 

10-'* 

10-* 

OPT 

40f. 

3 

4 

10-* 

49 

18 

10“* 

OPT 

10-’* 

49 

18 

10- * 

OPT 

40g. 

3 

4 

10-" 

85 

24 

10-* 

10-1® 

OPT. 

10-'* 

85 

24 

10-® 

10-'® 

OPT. 

41a. 

5 

10 

10-* 

mm 

10-* 

OPT. 

10-'* 

■i 

10-* 

OPT. 

41b. 

5 

10 

10-* 

■■ 

10® 

10-® 

in-* 

OPT 

10-'» 

U 

10" 

10-® 

10-* 

OPT. 

41c. 

Ti 

10 

10-* 

MM 

10-* 

opT 

10-'* 

MM 

10-* 

OPT. 

41d. 

5 

10 

ESI 

9 

7 

10-* 

10® 

10-'* 

OPT 

BO 

9 

7 

10"* 

10-'* 

OPT. 

41c. 

ESI 

14 

10 

10-* 

OPT 

BO 

14 

10 

10-* 

OPT. 

41f. 

5 

■m 

16 

12 

10-* 

10® 

10-'® 

10“* 

OPT, 

BD 

16 

12 

10-* 

10® 

10-'® 

10-* 

OPT. 

41g. 

5 

1C 

■TS9 

21 

15 

10-" 

10" 

10-* 

OPT. 

10-'* 

21 

15 

10-* 

10® 

10-* 

OPT 
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Nunirriral  Rosnlts  for  LSQSDN 


n 

til 

XTOL 

max. 

slop 

/.  / 

Ik'll, 

ll/'ll, 

ll.<7-||, 

rst. 

rrr. 

roiiv. 

42a.'' 

D 

BSB 

5.0 

60 

41 

6.3.5 

IQ-in 

I0-' 

10-'® 

urT 

■ 

IBB 

5.0 

61 

43 

6.3.5 

10-'" 

10-* 

10-'® 

* 

42b.'’ 

D 

HSBI 

IBI 

75 

67 

61.9 

lO"*' 

10-* 

10-" 

* 

II 

|U 

B9 

75 

67 

61.9 

lO"" 

10-' 

10-" 

42r.'' 

D 

BSEi 

IQS 

28 

61.8 

10-” 

10-® 

10-** 

nrr 

II 

IQB 

IQjI 

28 

61.8 

10-'* 

10-® 

10-** 

orr 

42d.'’ 

D 

lEEl 

IS 

m 

19 

60.3 

10-' 

10-* 

10-'" 

* 

II 

QB 

IQQ 

WM 

19 

60.3 

I0-* 

10-* 

10-'" 

438.” 

mi 

It 

10-'“ 

10“* 

10-'* 

r»PT 

Bfl 

18 

IQIQ 

10-® 

10-* 

10-'* 

43b.° 

IBBi 

BB 

51.0 

10-* 

10“* 

10“'* 

OPT 

■IHI 

IQB 

BB! 

54.0 

10-* 

10-* 

10“'* 

* 

43c.® 

BB 

10.0 

25 

14 

54.0 

10-* 

I0-* 

10“'* 

OFT 

IQB 

10.0 

33 

18 

54.0 

10-* 

10-* 

10"'* 

F 

43d.® 

B 

BQD| 

10.0 

30 

16 

ffS 

10-® 

10-* 

10“'* 

OFT 

B 

IQB 

10.0 

38 

20 

10-® 

10-* 

10"'* 

* 

43e.® 

1081 

10.0 

19 

10 

51.0 

10~'® 

10-* 

10“*® 

OFT 

IQB 

10.0 

27 

14 

54.0 

10-'® 

10-* 

10-*® 

* 

43f.® 

IBB 

Bl 

BB 

10-* 

10-* 

10“'* 

OFT 

QB 

BB 

10-® 

10-* 

10“'* 

* 

44a.® 

6 

IBB 

■QH 

4.06 

10-'* 

10-'® 

10-** 

OFT 

Bm 

QB 

4.06 

10-'* 

10-1® 

10-** 

OFT 

44b.® 

6 

6 

HUI 

6 

4 

.3..52 

10-* 

10-* 

10-14 

OPT. 

I0~” 

10 

6 

3..52 

10*  * 

10-* 

10-'* 

* 

44c.® 

BQ 

BB 

msrm 

10-* 

10-* 

10“'* 

* 

BB 

10“* 

10"* 

10-'* 

• 

44d.® 

40 

17 

15.3 

I0-* 

10-* 

10-'* 

• 

40 

17 

15.3 

10-® 

10-' 

10-'* 

« 

44e.® 

IQS 

bb 

lO-'-' 

10-1® 

10-*' 

OFT 

QB 

BB 

10-'* 

10-'® 

10“*' 

OFT 

45a.® 

BQ 

26 

4.06 

10-'* 

10-'" 

10-** 

OFT 

QB 

27 

4.06 

10-'* 

10-'® 

10“** 

OFT 

45b.® 

8 

8 

bb 

|B^9 

10-* 

10-" 

10-'* 

OFT 

QQI 

IBI 

10-* 

10-" 

10-'* 

» 

45c." 

8 

8 

bb 

BB 

10-* 

10-' 

10“’* 

♦ 

IBQ 

bB 

IQQ 

10-* 

10-* 

10“" 

* 

45d.® 

8 

8 

10"* 

42 

18 

■Efl 

10-® 

10-* 

10“'* 

10-'* 

42 

18 

Q^Q 

10-® 

10-* 

10“'* 

* 

45c.® 

8 

43 

18 

9.3 1 

10-'* 

10-'® 

10-** 

OFT 

43 

18 

9.31 

10“'* 

10-'® 

10-*' 

OFT 
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5.9  Test  Problems 


Superscripts  on  problem  numbers  have  the  following  interpretation  : 

°  ;  zero* residual  problem 
^  ;  linear  least-squares  problem 


i  Problems  from  Mor^,  Garbow,  and  Hillstrom  [1981] 


n  m 


l.° 

2 

2 

Rosaiihmrk 

2.° 

2 

2 

Frendenstoin  and  Roth 

3.0 

2 

2 

Powell  Badly  Sealed 

4.0 

2 

3 

Brown  Badly  Scaled 

5° 

2 

3 

Beale 

8. 

2 

10 

Jennridi  and  Sampson 

7.0 

3 

3 

Helical  Valley 

8. 

3 

15 

Bard 

9. 

3 

15 

Gaussian 

10. 

3 

16 

Meyer 

11.0 

3 

10 

Gulf  Research  and  Developmentf 

12.0 

3 

10 

Box  3>  Dimensional 

13.0 

4 

4 

Powell  Singular 

14.0 

4 

6 

Wood 

15. 

4 

11 

Kowalik  and  Osborne 

16. 

4 

20 

Brown  and  Dennis 

17. 

5 

33 

Osborne  1 

18.0 

6 

13 

Biggs  BXP6t 

t  For  the  Gulf  Researdi  and  Dovclopinmt  Function  (#  11),  the  formula 

given  in  Mor£,  Garbow,  and  Hillstrom  [1981]  for  the  residual  functions  is  in  error.  The  correct 
formula  is 


d.Cr)  =  exp 

(see  Mori,  Garbow,  and  Hillstrom  [1978]). 


|y<  -  > 


-  xal'’ 


-  U 


t  For  the  Biggs  EXP6  Function  (#  18),  the  minmum  value  for  the  sum  of  squares  is  given 
in  More,  Garbow,  and  Hillstrom  [1981]  as  .9.655R5 . . .  x  10~®.  It  can  be  easily  verified  that  the 
residuals  vanish  at  several  points  (for  example  ( 1 , 10. 1 , 5, 4,  .1)). 


in 


Problems  from  Mor4,  Garbow,  and  Hillstrom  [1981]  (continued) 


19. 

n 

11 

m 

65 

Osbrrrnc  2t 

20a. 

6 

31 

Watsnn 

2nh. 

9 

31 

Watson 

20c. 

12 

31 

Watsnn 

20d. 

20 

31 

Watson 

21a.° 

10 

10 

Extended  no.*enbrock 

21b.® 

20 

20 

Extended  Ilnsenbrock 

22a.° 

12 

12 

Extended  Powell  Singular 

22b.° 

20 

20 

Extended  Powell  Singular 

23a. 

4 

5 

Penalty  I 

23b. 

10 

11 

Penalty  I 

24a. 

4 

8 

Penalty  II 

24b. 

10 

20 

Penalty  II 

25a.° 

10 

12 

Variably  Dimensioned 

25b.® 

20 

22 

Variably  Dimensioned 

2«a.® 

10 

10 

IVigonometric 

26b.® 

20 

20 

Trigonometric 

27a.® 

10 

10 

Brown  Almost  tinear 

27b.® 

20 

20 

Brown  Almost  Linear 

28a.® 

10 

10 

Discrete  Boundary  Value 

28b.° 

20 

20 

Discrete  Boundary  Value 

29a.® 

10 

10 

Discrete  Integral 

29b.® 

20 

20 

Discrete  Integral 

30a.® 

10 

10 

Broyden  TVidiagonat 

30b.® 

20 

20 

Broyden  TVidiagonal 

31a.® 

10 

10 

Broyden  Banded 

31b.® 

20 

20 

Broyden  Banded 

32.' 

10 

20 

Linear  —  Full  Rank 

33.‘ 

10 

20 

Linear  —  Bank  1 

34.*^ 

10 

20 

Linear  —  Bank  1  with  Zero  Coliinins  and  Rows 

35a. 

8 

8 

Chcliyquad 

35b.® 

9 

9 

Chebyqnad 

35c* 

10 

10 

Chobyqnad 

t  For  Osborne's  Second  Function  (#  19),  the  value  of  /(r*)  >*  given  (to  six  figures)  in  Mori, 
Garbow,  and  Hillstrom  [1981]  as  4.01377  x  10"*.  The  smallest  value  we  were  able  to  obtain  was 
4.0168.1  X  10"*. 
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Matrix  Square  Root  Problemsft 


n 

rn 

36a.° 

i 

4 

Matrix  Square  Root  1 

36b.° 

n 

9 

Matrix  Square  Root  2 

36c.® 

9 

9 

Matrix  Square  Root  3 

36rl.® 

9 

9 

Matrix  Square  Root  4 

t  These  test  problems  come  from  a  private  communication  of  S.  Hammarling  to  P.  E.  Gill  in 
1983. 


t  The  identity  matrix  was  used  as  the  starting  value  in  all  Instances.  Note  that  the  iteration 
should  not  be  started  with  the  zero  matrix  because  It  Is  a  stationary  point  of  the  sum  of  squares. 


» 


Problems  from  Salane  [1987] 
n  m 

37.  2  16  Hanson  1 

38.  3  16  Hanson  3 
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Problems  from  McKeown  [1975a]  (also  McKeown  [1975b]) 


n 

m 

/' 

39a. 

2 

3 

McKeown  1 

0.001 

39b. 

2 

3 

McKeown  1 

0,01 

39c. 

2 

3 

McKeown  1 

0.1 

39d. 

2 

3 

McKeown  1 

1.0 

39e. 

2 

3 

McKeown  1 

10.0 

39f. 

2 

3 

McKeown  1 

100.0 

39g. 

2 

3 

McKeown  1 

1000.0 

40a.t 

3 

4 

McKeown  2 

0.001 

40b.  t 

3 

4 

McKeown  2 

0.01 

40c.t 

3 

4 

McKeown  2 

0.1 

40rl.t 

3 

4 

McKeown  2 

1.0 

40e.t 

3 

4 

McKeown  2 

10.0 

40f.t 

3 

4 

McKeown  2 

100.0 

40g.t 

3 

4 

McKeown  2 

1000.0 

41r. 

5 

10 

McKeown  3 

0.001 

41b. 

5 

10 

McKeown  3 

0.01 

41c. 

5 

10 

McKeown  3 

0.1 

41d. 

5 

10 

McKeown  3 

1.0 

41e. 

5 

10 

McKeown  3 

10.0 

41f. 

5 

10 

McKeown  3 

100.0 

5 

10 

McKeown  3 

1000.0 

t  In  the  data  defining  this  problem  given  in  McKeown  [197Sa]  and  [1975b],  the  matrix 


(2.95137  4.P7107 

4.S7407  9.39321 

-2.0506  -3.93189 

is  in  error  (it  should  be  symmetric).  The  value 

(  2.95137  4.87107 

4.87407  9.39321 

-2.0506  -3.93189 


-2.0506  \ 
-3..93181  ) 
2.64715  / 


-2.0.506  \ 
-3.93189  j 
2.64745  / 


which  is  correct  to  six  decimal  digits,  was  used  in  our  formulation  of  the  problem. 


4 


I 
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Problems  from  DeVilliers  nnd  Glnsser  [1981]  (also  Salane  [1987]) 


n 

m 

starting  value 

42a.° 

•t 

24 

DeVilliers  nnd  Glasser  i 

(1.0,8.0.4.0.4.412) 

42b  ° 

4 

24 

DeVilliers  and  Glasser  1 

(1.0.  8.0. 8.0. 1.0) 

42r  ° 

4 

24 

DeVilliers  and  Glasser  1 

(1.0.8.0,1.0.4.412) 

42d  ° 

4 

24 

DeVilliers  and  Glasser  1 

(1.0, 8.0. 4.0, 1.0) 

43a.° 

5 

16 

DeVilliers  and  Glasser  2 

( 46.0. 2.0,  2..5.  l.S.O.n) 

43b.o 

5 

16 

DeVilliers  and  Glasser  2 

(42.0,0.8,1.4.1.8,1.0) 

43c.° 

5 

16 

DeVilliers  and  Glasser  2 

(4.5.0.2.0.2.1.2.0,0.<)) 

43d.‘’ 

5 

16 

DeVilliers  and  Glasser  2 

(4.'>.0,2..5, 1.7, 1.0,1. 0) 

43c.° 

5 

16 

DeVilliers  and  Glasser  2 

(3.5.0.2..5. 1.7,  l.O.l.O) 

43f° 

5 

16 

DeVilliers  and  Glasser  2 

(42.0,0.8,1.8,3.1.6,1.0) 

Problems  fVom  Dennis,  Gay,  and  Vu  [1985] 

n 

m 

starting  value 

44a.®t 

6 

6 

Exp. 

791129 

(.299.  -0.273.  -.474,  .474,  -.0892.  .0882)t 

44b.®t 

6 

6 

Exp. 

791326 

(-..3,  ..3.-1.2.2.69,  ).59,-1..5) 

44c.®t 

6 

6 

Exp. 

0121a 

(-.041,  .03,  -2..56.5.  2..56.5,  -.7.54,  .7.54)t 

44d.'’t 

6 

6 

Exp. 

0121b 

(-.0.56,  .026.  -2.991,2.991,  -..568,  ..568) 

44e.°t 

6 

6 

Exp. 

0121c 

(-.074,  .013.  -3.6.32, 3.6.32,  -.289,  .289) 

45a.° 

8 

8 

Exp. 

791129 

(.299,  .186.  -0.273.  .02.54,  -0.474,  -.0892,  .0892)t 

45b.'’ 

8 

8 

Exp. 

791336 

(-.3.  -..39,  .3.  -.314.  -1.2.  2.69, 1.. 59,  -1.. 5) 

45c.'’ 

8 

8 

Exp. 

0121a 

(-.041.  -.77.5,  .03.  -.047.  -2..56.5,  2..56.5.  -.7.54,  .7.54 )J 

45d.'’ 

8 

8 

Exp.  ni21b 

(-.0.56.  -.7.53,  .026,  -  .017.  -2.991. 2.991,  -  ..568,  .568) 

45e.'’ 

8 

8 

Exp. 

0121c 

(-.074,  -.7.33,  .013.  -.034,  -3.632.  3.632.  -.289.  .289) 

t  Variables  73  and  74  (i  and  d  in  Dennis,  Gay,  and  Vu  [1985]}  are  eliminated  from  the  linear 
constraints  in  order  to  (et  the  8-variable  formulation  of  the  problem  (see  Dennis,  Gay,  and  Vu 
11985]). 


}  Specification  of  some  starting  values  in  Dennis,  Gay,  and  Vu  [1985]  is  incomplete.  The  correct 
values  were  obtained  from  D.  M.  Gay  in  1986. 
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